Background: End-to-End Flow

Purpose

This document explains in detail what runs in the background when a user or API triggers a PMF analysis: entry points, step order, data flow, and persistence. It is the single reference for the full sequence from URL to final report/diagnosis.

Interns and reports are produced by agents. Intern agents (Discovery, Product, UI Capture, Competitor) produce the raw data; report agents (CMO, BD, COO, CDO) produce Reports 1–3 and the CDO design audit. See Agents overview.

See also: System Architecture for the high-level agent view and dependency map.

Entry Points

Three ways the pipeline can be triggered:

Entry	Trigger	What runs	Persistence
Web (UI)	User enters URL in the app	1. `POST /api/roles/crawl` (SSE) runs `runSharedCrawl` → persist to `company_crawls`. 2. Client then calls `POST /api/roles/cmo` with `crawlId` or `projectId` → load crawl from DB → `runReportFromCrawl` (no discovery or crawl).	Crawl in `company_crawls`; report snapshot when CMO completes.
Public API	`POST /api/v1/analyze` with API key	`runCmoPipeline`: discovery → crawl → enrichment → report in one shot (no pre-saved crawl).	Result saved to `analysis_runs` and report snapshot.
Orchestrator	Programmatic call to `runDiagnosis`	Optional crawl (or load by `crawlId`/`projectId`) → Evidence Layer → Report 1 → optional Report 2 (BD), Report 3 (COO), CDO → `toFinalDiagnosisV1`.	Caller controls; can persist via Supabase.

Web path: app/api/roles/crawl/route.ts, app/api/roles/cmo/route.ts. Crawl uses lib/pipeline/run-shared-crawl.ts (runSharedCrawl); CMO uses lib/pipeline/run-report-from-crawl.ts (runReportFromCrawl).
API path: app/api/v1/analyze/route.ts → lib/pipeline/run-cmo-pipeline.ts (runCmoPipeline).
Orchestrator: lib/pipeline/run-diagnosis.ts (runDiagnosis) — single entry when a caller wants the full diagnosis (Report 1 + optional 2, 3, CDO) in one go.

End-to-End Diagram

Crawl Pipeline (runSharedCrawl)

Code: lib/pipeline/run-shared-crawl.ts.

The crawl pipeline runs the Discovery intern agent, then the Product and UI Capture intern agents in parallel, then optional Pricing pass and Competitor (enrichment) intern agent; see Interns for what each does.

Step order (summary): Discovery → Product + UI capture (parallel) → merge from UI HTML → optional pricing pass → optional Competitor enrichment → persist to company_crawls. Key modules: core/crawlers/, lib/crawler/pricing-pass.ts, lib/persistence/company-crawls.ts.

Reuse, freshness, politeness, enterprise: See CRAWL_SYSTEM.md.

Evidence Layer (in Detail)

The Evidence Layer is the single structured evidence input for all agents. It is built from crawl outputs only; no agent re-scrapes.

Code: core/evidence/normalize.ts. Types: core/evidence/types.ts.

Inputs

NormalizeFromCrawlInput:

analyzedUrl: string
pageEvidence: PageEvidence[] (from product crawl)
enrichedData: EnrichedData | null (from competitor enrichment)
uiCaptures?: CaptureResult[] (from UI capture)

Output

EvidenceLayer:

version: "1"
analyzedUrl: string
builtAt: string (ISO timestamp)
byCategory: EvidenceByCategory — one array per category (see evidence-taxonomy.md)
items: TaggedEvidenceItem[] — flat list of all items

Mapping (crawl → categories)

From page evidence (each page can contribute to multiple categories):

product_surface — headlines, keyPoints, textPreview from page content.
documentation — productDetails (features, useCases, benefits, technicalDetails).
pricing_model — pricing.plans, pricingModel, rawText.
social_proof — customerLogos, or pageType case-studies with textPreview.
brand_signals — metadata (title, description, ogSiteName), teamInfo (teamMembers, mission, companyStory, foundingYear).
technical_signals — productDetails.technicalDetails, integrationMentions.
distribution_hooks — integrationMentions, keyPoints that look like CTAs (e.g. sign up, book a demo).

From enriched data:

brand_signals — brandValidation.
distribution_hooks — competitorProfiles (count, names), distributionChannels, trafficData.
social_proof — socialMetrics.

From UI captures:

brand_signals — one item per capture (reference only: hasScreenshot, hasHtml); actual HTML/screenshots are passed separately to CDO.

TaggedEvidenceItem

Each item has:

category — one of the taxonomy categories.
sourceUrl — page or origin URL.
sourceType — "page" | "enrichment" | "ui_capture" | "api".
extractedAt — ISO timestamp.
extractor — e.g. "product_crawl", "enrichment", "ui_capture".
payload — opaque per-category data (e.g. plans, headlines, brandValidation).

Rules: One crawl produces one Evidence Layer. Agents consume only this layer. Every item is traceable (sourceUrl, sourceType, extractor).

Report 1 Pipeline (CMO Path)

Report 1 is the canonical CMO output: discovery verdict (stage, recommendation), component analyses, DRL, and assembled report.

When using runReportFromCrawl (UI path: pre-crawled data)

Code: lib/pipeline/run-report-from-crawl.ts.

Build Evidence Layer from stored crawl: normalizeFromCrawl({ analyzedUrl, pageEvidence, enrichedData, uiCaptures: [] }).
Geographic — lib/pipeline/steps/geographic.ts. Detect regional focus and market scope from page evidence.
Category — lib/pipeline/steps/category.ts. Extract market category from evidence (AI).
Component analysis — lib/pipeline/steps/component-analysis.ts. Run Brand, Product, Pricing, Market, Distribution (and optional Visual Brand) analyzers over page evidence and enriched data.
Synthesis — lib/pipeline/steps/synthesis.ts. Synthesize component analyses into discovery verdict (stage, recommendation, confidence).
Assemble — lib/pipeline/steps/assemble-report.ts. Build Report 1 (discoveryVerdict, component analyses, DRL, execution log, etc.).

Stage and recommendation are owned by CMO (Report 1). No other agent overwrites them. For CMO pipeline steps and component analyzers in detail, see Report agents.

When using runCmoPipeline (API path: no pre-saved crawl)

Code: lib/pipeline/run-cmo-pipeline.ts.

Same logical Report 1 steps (geographic → category → component → synthesis → assemble), but preceded by:

Discovery step — lib/pipeline/steps/discovery.ts.
Crawl step — lib/pipeline/steps/crawl.ts.
Enrichment step — lib/pipeline/steps/enrichment.ts.

So the API path does discovery + crawl + enrichment inline, then runs the same report steps. No Evidence Layer persist step; it is built in memory and used for the report.

Report 2 (BD) and Report 3 (COO)

Report 2 (BD): Input is Report 1 only. Output: distribution relevance, expert quotes, cases, “companies like you”. Only used when runDiagnosis is called with includeBd: true. Technical behavior: Report agents.
Report 3 (COO): Input is Report 1 + Report 2. Output: what to change, strategies, small wins, next actions, one thing this week. Only when runDiagnosis is called with includeCoo: true and Report 2 is present. Technical behavior: Report agents.

Neither BD nor COO re-crawl; they are prompt-only on Report 1 (and Report 2 for COO).

Final Output

toFinalDiagnosisV1 — lib/pipeline/to-final-diagnosis-v1.ts. Aggregates Evidence Layer + Report 1 (+ optional Report 2, Report 3, CDO) into the canonical FinalDiagnosisV1 shape. CDO design audit: Report agents. Schema: lib/schemas/final-diagnosis-v1.ts. Single contract for API, export, and UI single-diagnosis view.
generatePublicSummary — lib/public-summary.ts. Produces a compressed, tweet-ready summary from FinalDiagnosisV1 (PMF stage, confidence, primary risk, one fix this week, short explanation). Transformation only; no separate engine.

System Architecture — high-level flow and dependency map
Evidence Taxonomy — category definitions and storage
Agent Contracts — per-agent I/O and versioning
CRAWL_SYSTEM.md — crawl reuse, freshness, politeness, enterprise