Reasoning Standards

Purpose

Enforce Evidence → Inference → Judgment separation. No black-box reasoning; claim types and support are explicit. No undocumented logic.

Evidence → Inference → Judgment

Evidence: Tagged facts from the Evidence Layer (and raw crawl where needed for backward compatibility). Stored, traceable, no interpretation.
Inference: AI or deterministic logic that produces claims from evidence. Each claim has a type and optional support.
Judgment: Final outputs (e.g. stage, recommendation, next actions) that are derived from claims and documented rules.

Agents must not skip evidence or introduce judgments without traceable support.

Claim Types

Used in Report 1 and reasoning layer (lib/schema.ts, lib/types.ts):

observed: Directly supported by page/enrichment evidence (e.g. “Pricing page lists three plans”).
inferred: Inferred from evidence with stated reasoning (e.g. “Likely B2B given job titles on site”).
estimated: Quantitative or qualitative estimate with confidence (e.g. “DRL ~2 based on channel mix”).

Each claim includes:

claim: String description.
type: observed | inferred | estimated.
confidence: high | medium | low.
support: Array of EvidenceItem (sourceUrl, sourceType, snippet, extractor).

EvidenceItem

sourceUrl: Where the evidence came from.
sourceType: "page" | "search" | "api" | "ai" | "enrichment".
snippet: Optional quote or summary.
extractedAt, extractor: For audit.

Rules

No undocumented logic. If a rule drives stage, recommendation, or DRL, it is documented (here or in code comments/specs).
No black-box reasoning. Component analyzers and synthesis consume evidence and produce claims with support; judgments (e.g. discovery verdict) are derived from documented rules.
Conflict resolution: Stage and recommendation are owned by CMO (Report 1). Other agents do not overwrite them.

Confidence and evidence coverage (final output)

The canonical final output (FinalDiagnosisV1) includes a confidence score (0–100) and evidence coverage summary per taxonomy category. Uncertainty must be visible; overconfidence undermines trust.

Confidence is derived from: (1) evidence count and diversity (Evidence Layer), (2) signal clarity from component analyses, (3) verdict confidence (High/Medium/Low). Implemented in lib/pipeline/to-final-diagnosis-v1.ts; capped to avoid overconfidence.
Evidence coverage summarizes strength per category: product_surface, pricing_model, distribution_hooks, social_proof, technical_signals (e.g. strong/moderate/weak, clear/unclear/missing, present/weak/absent). Derived from Evidence Layer counts and Report 1 component analyses.