A tour of every piece: pipeline variants, evidence classification, phase-gating, cross-session continuity, and the output artifacts you end up with.
Every pipeline starts with architecture. From there, you choose how far to go. The default is a 7-phase full analysis with a split defect scan. Scale back when you only need orientation.
7 phases
Complete analysis with a two-pass defect scan. An early mechanical sweep catches surface-level issues; a later semantic pass re-examines defects with full contracts and protocols context before reimplementation planning.
The default pipeline. Best when you need the deepest defect analysis grounded in full behavioral understanding.
Phases form a DAG. Contracts and protocols can run in parallel after architecture. Porting waits for both. Reimplementation spec is last.
An LLM can sound certain about things it inferred. CodeCartographer requires every finding to carry an evidence tag so you know what was observed, what was deduced, and what remains an open question.
Directly visible in source code or documentation. Not inferred.
Deduced from patterns and structure. High confidence but not directly stated.
Behavior or assumption that may not survive a rewrite or language change.
Could not determine from available sources. Needs human input or deeper analysis.
If an LLM cannot classify a finding with one of these four tags, the finding is not specific enough to be useful. Vague assertions get rejected by the validation protocol.
Each phase produces a structured output against a template. The validation protocol checks
completion criteria: are all required sections present, are findings tagged with evidence
levels, are open questions logged in status.yaml.
templates/
Structured Markdown templates enforce consistent sections across projects and sessions. Every artifact has the same shape regardless of which LLM produced it.
VALIDATE.md
Run after every phase. Checks that outputs match templates, evidence tags are applied, and partial results are logged properly before allowing the status to advance.
Large codebases need multiple sessions. status.yaml is the single source of
truth. A new session reads the guide, checks the status file, sees what is complete, and
starts the next phase automatically. No explaining what happened before.
status.yamlMutable per-project state. Tracks phase completion, current phase, open questions, and the active pipeline. The only file that changes during a run.
THREAD_LOG.mdCross-session summary log. Each session appends what it completed and what needs attention, giving the next session context beyond the machine-readable status file.
In Pi-mono, phase sub-agents persist transcripts alongside the orchestrator session.
/resume, /tree, and /export browse them as
first-class sessions with lineage back to the orchestrator.
Each artifact targets a different audience: engineers, reviewers, maintainers, and the next LLM session.
Layers, public surfaces, runtime lifecycle, dependency direction, porting priorities.
Multi-pass scan: logic, error handling, concurrency, security, API drift, config risks.
User-visible behavior, defaults, side effects, error modes, black-box acceptance checks.
Events, state machines, persistence notes, compatibility hazards, internal message flow.
Synthesis layer ranking what matters, what is risky, what needs special treatment in a rewrite.
Language-agnostic build plan with modules, acceptance scenarios, and known unknowns.
Read the installation docs for drop-in setup, Pi-mono commands, or MCP server configuration.