Evidence Taxonomy
Purpose
Structured evidence categories for the Evidence Layer. Every piece of evidence is stored, tagged, and traceable. Agents consume only this layer; Evidence → Inference → Judgment separation is enforced.
Categories
| Category | Description | Typical sources |
|---|---|---|
| product_surface | Headlines, key points, primary product message | Page crawl (homepage, product, docs) |
| documentation | Features, use cases, benefits, technical details | Docs/product pages |
| pricing_model | Plans, pricing model, raw pricing text | Pricing page, enrichment |
| social_proof | Customer logos, case studies, social metrics | Page content, enrichment |
| brand_signals | Metadata (title, description), team, mission, org schema, UI capture refs | All pages, enrichment, UI capture |
| technical_signals | Technical details, integration mentions | Product/docs pages |
| distribution_hooks | CTAs, integration mentions, distribution channels, traffic | Pages, enrichment |
| competitive_intel | Competitors, positioning, differentiation, lessons | Enrichment (search APIs, company DB, competitor crawl) |
Tagged Evidence Item
Each item in the Evidence Layer has:
- category: One of the categories above.
- sourceUrl: Page or origin URL.
- sourceType:
"page"|"enrichment"|"ui_capture"|"api". - snippet: Optional summary for prompts.
- extractedAt: ISO timestamp.
- extractor: Intern/extractor name (e.g.
product_crawl,enrichment). - payload: Opaque per-category data (e.g. plans, headlines).
Storage and Traceability
- Raw crawl remains in
company_crawls(discovered_pages, page_evidence, ui_captures, enriched_data). - Evidence Layer is built by
normalizeFromCrawl()incore/evidence/normalize.tsand is the single structured view passed to orchestration. Optionally a snapshot can be stored for regression (not required in v1). - Every claim in reports should be supportable by EvidenceItem (sourceUrl, sourceType, snippet) in the reasoning layer; see reasoning-standards.md.
Mapping from Crawl to Categories
- Page evidence → product_surface, documentation, pricing_model, social_proof, brand_signals, technical_signals, distribution_hooks (by page type and content).
- Enriched data → brand_signals (brand validation), distribution_hooks (distribution channels, traffic), social_proof (social metrics), competitive_intel (competitors).
- UI captures → brand_signals (reference only; actual HTML/screenshots passed separately to CDO).