For engineers and contributors. User-facing documentation lives at /docs.

Evidence Taxonomy

Purpose

Structured evidence categories for the Evidence Layer. Every piece of evidence is stored, tagged, and traceable. Agents consume only this layer; Evidence → Inference → Judgment separation is enforced.

Categories

CategoryDescriptionTypical sources
product_surfaceHeadlines, key points, primary product messagePage crawl (homepage, product, docs)
documentationFeatures, use cases, benefits, technical detailsDocs/product pages
pricing_modelPlans, pricing model, raw pricing textPricing page, enrichment
social_proofCustomer logos, case studies, social metricsPage content, enrichment
brand_signalsMetadata (title, description), team, mission, org schema, UI capture refsAll pages, enrichment, UI capture
technical_signalsTechnical details, integration mentionsProduct/docs pages
distribution_hooksCTAs, integration mentions, distribution channels, trafficPages, enrichment
competitive_intelCompetitors, positioning, differentiation, lessonsEnrichment (search APIs, company DB, competitor crawl)

Tagged Evidence Item

Each item in the Evidence Layer has:

  • category: One of the categories above.
  • sourceUrl: Page or origin URL.
  • sourceType: "page" | "enrichment" | "ui_capture" | "api".
  • snippet: Optional summary for prompts.
  • extractedAt: ISO timestamp.
  • extractor: Intern/extractor name (e.g. product_crawl, enrichment).
  • payload: Opaque per-category data (e.g. plans, headlines).

Storage and Traceability

  • Raw crawl remains in company_crawls (discovered_pages, page_evidence, ui_captures, enriched_data).
  • Evidence Layer is built by normalizeFromCrawl() in core/evidence/normalize.ts and is the single structured view passed to orchestration. Optionally a snapshot can be stored for regression (not required in v1).
  • Every claim in reports should be supportable by EvidenceItem (sourceUrl, sourceType, snippet) in the reasoning layer; see reasoning-standards.md.

Mapping from Crawl to Categories

  • Page evidence → product_surface, documentation, pricing_model, social_proof, brand_signals, technical_signals, distribution_hooks (by page type and content).
  • Enriched data → brand_signals (brand validation), distribution_hooks (distribution channels, traffic), social_proof (social metrics), competitive_intel (competitors).
  • UI captures → brand_signals (reference only; actual HTML/screenshots passed separately to CDO).