Spec Kitty

└─ kitty-specs
   └─ DRG Phase Zero — Graph Model, Context Parity, and Surface Calibration

Mission Run:

📚 Docs ↗

Implementation Plan: DRG Phase Zero

Branch: main | Date: 2026-04-13 | Spec: spec.md Input: Feature specification from kitty-specs/drg-phase-zero-01KP2YCE/spec.md Mission ID: 01KP2YCESBSG61KQH5PQZ9662H

Summary

Build the Doctrine Reference Graph (DRG) as a YAML-based graph model that makes implicit doctrine references explicit and queryable. Reroute call sites to the canonical context builder. Implement build_context_v2() by composing DRG query primitives. Prove parity via invariant regression test and enforce minimum-effective-dose surface calibration.

Technical Context

Language/Version: Python 3.11+ Primary Dependencies: Pydantic (model/validation), ruamel.yaml (YAML I/O), typer/rich (CLI if needed) Storage: Filesystem only (graph.yaml in src/doctrine/, test fixtures in tests/) Testing: pytest, mypy --strict, 90%+ coverage for new code Target Platform: CLI tool (cross-platform, no OS-specific code) Project Type: Single project (monorepo for spec-kitty) Performance Goals: Graph load < 500ms, context query < 200ms, full test matrix < 60s Constraints: No cross-repo changes, inline references must remain in place (C-001) Scale/Scope: ~80 nodes, ~200 edges (current shipped doctrine), 4 actions x ~10 profiles

Charter Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Gate	Status	Notes
DIRECTIVE_003 (Decision Documentation)	Pass	DRG schema is an ADR-worthy decision; graph.yaml IS the documentation
DIRECTIVE_010 (Specification Fidelity)	Pass	Invariant test (FR-007) proves fidelity to spec
90%+ test coverage	Committed	NFR-004
mypy --strict	Committed	NFR-005
Integration tests for CLI commands	N/A	No new CLI commands in this mission

Project Structure

Documentation (this feature)

kitty-specs/drg-phase-zero-01KP2YCE/
├── spec.md              # Mission specification
├── plan.md              # This file
├── data-model.md        # DRG schema entities
├── checklists/
│   └── requirements.md  # Spec quality checklist
└── tasks/               # Work packages (created by /spec-kitty.tasks)

Source Code (new and modified)

src/doctrine/
├── graph.yaml                    # [NEW] The DRG data file (generated by migration)
└── drg/                          # [NEW] DRG infrastructure package
    ├── __init__.py               # Public API: DRGGraph, load_graph, validate_graph, query
    ├── models.py                 # Pydantic models: DRGNode, DRGEdge, DRGGraph
    ├── loader.py                 # load_graph(path) -> DRGGraph, merge_layers(shipped, project)
    ├── validator.py              # validate_graph(): dangling refs, cycles, malformed URNs
    ├── query.py                  # walk_edges(), resolve_context(): edge traversal primitives
    └── migration/
        ├── __init__.py
        ├── extractor.py          # walk_artifacts() -> list[DRGEdge]
        ├── calibrator.py         # apply_surface_calibration() -> adjusted scope edges
        └── id_normalizer.py      # normalize directive IDs (DIRECTIVE_NNN <-> NNN-slug)

src/charter/
└── context.py                    # [MODIFIED] Add build_context_v2() composing DRG primitives

src/specify_cli/
├── next/prompt_builder.py        # [MODIFIED] Reroute import to src/charter/context.py
└── cli/commands/agent/workflow.py # [MODIFIED] Reroute import to src/charter/context.py

tests/doctrine/drg/               # [NEW] DRG unit tests
├── __init__.py
├── conftest.py                   # Shared fixtures: sample graph.yaml, malformed graphs
├── test_models.py                # Pydantic model validation
├── test_loader.py                # Graph loading and layer merging
├── test_validator.py             # Dangling refs, cycles, URN format
├── test_query.py                 # Edge traversal, depth limiting
└── migration/
    ├── __init__.py
    ├── test_extractor.py         # Extraction from shipped artifacts
    ├── test_calibrator.py        # Surface calibration adjustments
    └── test_id_normalizer.py     # ID format normalization

tests/charter/
├── test_context_parity.py        # [NEW] Invariant regression test (FR-007)
├── test_surface_calibration.py   # [NEW] Calibration inequality test (FR-008)
└── fixtures/
    └── accepted_differences.yaml # [NEW] Exception ledger (empty by default)

Architecture

Layer Separation (Guardrail 1)

The DRG package (src/doctrine/drg/) is strictly doctrine-graph infrastructure. It owns:

Schema/model: Pydantic models for nodes, edges, and the graph document
Loading/validation: Parse graph.yaml, validate integrity, merge shipped + project layers
Query primitives: Walk edges by relation type, depth-limited traversal, transitive closure

It does NOT own:

Charter-specific assembly policy (profile x action x depth expansion rules)
Action-scoped rendering (directive/tactic line formatting, reference filtering)
Governance resolution or project selection intersection

Charter-specific assembly lives in src/charter/context.py. build_context_v2() composes DRG primitives:

build_context_v2(profile, action, depth)
    │
    ├── drg.loader.load_graph()           # Load and merge graph layers
    ├── drg.query.walk_edges(             # Walk scope edges to depth 1
    │       start="action:{mission}/{action}",
    │       relations=["scope"],
    │       max_depth=1)
    │                                       # (applies is v1 schema but not populated in Phase 0)
    ├── drg.query.walk_edges(             # Walk requires transitively
    │       start=<resolved nodes>,
    │       relations=["requires"],
    │       max_depth=None)               # Transitive closure
    ├── drg.query.walk_edges(             # Walk suggests to user depth
    │       start=<resolved nodes>,
    │       relations=["suggests"],
    │       max_depth=depth)
    ├── drg.query.walk_edges(             # Collect vocabulary edges
    │       start=<resolved nodes>,
    │       relations=["vocabulary"])
    │
    └── _materialize_artifacts(nodes)     # Charter-layer: load, format, render

This keeps Phase 1 replacement local: build_charter_context() gets deleted, build_context_v2() becomes build_context(), all in src/charter/context.py. The DRG package is untouched.

Accepted-Differences Ledger (Guardrail 2)

tests/charter/fixtures/accepted_differences.yaml is an exception ledger, not a design escape hatch.

Rules:

1. Empty by default. If the migration and calibration are correct, this file stays empty. 2. Every entry must include: the exact (profile, action, depth) case, the legacy artifact set, the DRG artifact set, a concrete reason why the difference is intentional, and a follow-up issue number if the difference should be resolved later. 3. No "expected drift" entries. If a difference exists because the DRG path is correct and the legacy path was wrong-sized, the entry must say so with evidence. "We expect some drift" is not an acceptable reason. 4. Threshold gate: If more than 10% of the test matrix has accepted differences, Phase 0 is not done. The migration extractor or calibrator must be fixed.

Schema:

# tests/charter/fixtures/accepted_differences.yaml
schema_version: "1.0"
entries: []
# Each entry:
# - profile: "implementer"
#   action: "implement"
#   depth: 2
#   legacy_artifacts: ["DIRECTIVE_024", "DIRECTIVE_025", ...]
#   drg_artifacts: ["DIRECTIVE_024", "DIRECTIVE_025", "DIRECTIVE_030", ...]
#   reason: "Legacy path missed DIRECTIVE_030 because action index slug '030-...' failed normalization"
#   follow_up_issue: null  # or "#NNN" if this should be fixed
#   accepted_by: "robert"
#   accepted_at: "2026-04-15"

graph.yaml Lifecycle

graph.yaml has two distinct lifecycle phases:

1. Phase 0 (bootstrap): The migration extractor generates graph.yaml from inline references + calibration adjustments. The migration is idempotent: same inputs produce the same output. During Phase 0, the migration is the source of truth; graph.yaml is a derived artifact.

2. Post-Phase 0 (authoritative): Once Phase 1 deletes inline references, graph.yaml becomes the authoritative source for governance wiring. Calibration edits and new edges go directly into graph.yaml. The migration is no longer re-run (its inputs no longer exist).

This means: during Phase 0, calibration failures are fixed by adjusting the calibrator inputs (action index files) and regenerating. Post-Phase 0, calibration failures are fixed by editing graph.yaml directly. The spec's "DRG is the only knob" principle holds in both phases -- only the editing mechanism changes.

Migration Extraction Strategy

The extractor walks three artifact categories and emits edges into graph.yaml:

1. Artifact-to-artifact edges (from inline reference fields):

Source artifact	Field	Edge relation	Target kind
Directive	`tactic_refs`	`requires`	tactic
Directive	`references[type=directive]`	`requires`	directive
Directive	`references[type=tactic]`	`suggests`	tactic
Directive	`references[type=styleguide]`	`suggests`	styleguide
Tactic	`references[type=tactic]`	`suggests`	tactic
Tactic	`references[type=styleguide]`	`suggests`	styleguide
Paradigm	`tactic_refs`	`requires`	tactic
Paradigm	`directive_refs`	`requires`	directive
Paradigm	`opposed_by`	`replaces`	paradigm/tactic

Edge metadata: when field (from tactic references), reason field (from opposed_by).

2. Action-to-artifact edges (from action index files):

Source	Field	Edge relation	Target kind
Action node	`directives`	`scope`	directive
Action node	`tactics`	`scope`	tactic
Action node	`styleguides`	`scope`	styleguide
Action node	`toolguides`	`scope`	toolguide
Action node	`procedures`	`scope`	procedure

Action nodes use URN format action:{mission}/{action} (e.g., action:software-dev/specify).

3. ID normalization: The extractor must normalize directive IDs between the two formats:

Action indices use slug format: 024-locality-of-change
Directive YAMLs define: DIRECTIVE_024
DRG canonical form: directive:DIRECTIVE_024 (URN uses the YAML-defined ID)
The existing _normalize_directive_id() in src/charter/context.py already handles this; the extractor reuses the same logic via src/doctrine/drg/migration/id_normalizer.py.

Surface Calibration

Current measured action surfaces (from shipped action indices):

Action	Directives	Tactics	Toolguides	Total refs
specify	2	1	0	3
plan	2	2	0	4
implement	6	6	1	13
review	2	3	0	5
tasks	(no index)	(no index)	(no index)	0

Required calibration inequalities:

|context(specify)| < |context(plan)| < |context(implement)|    -- currently: 3 < 4 < 13 ✓
|context(tasks)|   < |context(implement)|                      -- currently: 0 < 13 ✓ (vacuously)
|context(review)|  ≈ |context(implement)|                      -- currently: 5 vs 13 ✗

Calibration actions needed:

1. tasks action: Create src/doctrine/missions/software-dev/actions/tasks/index.yaml with appropriate scope edges. tasks should be lighter than implement but heavier than plan (it needs planning context plus some implementation awareness). Estimated: ~6-8 total refs.

2. review action: The ≈ relation means review should see roughly the same governance surface as implement, because reviewers need the same context as implementers to judge correctness. The current review surface (5) needs additional scope edges to approach implement (13). The calibrator adds scope edges for the directives and tactics that review should share with implement.

3. Surface measurement: Surface size is measured as the count of distinct artifacts reachable from the action node via scope edges (depth 1), plus transitive requires closure. Token estimates are a secondary metric derived from materializing the artifacts.

Call-Site Audit (NOT Reroute)

Phase 0 does NOT reroute any production call sites. The two implementations have different behavior:

Feature	`src/charter/context.py` (canonical)	`src/specify_cli/charter/context.py` (legacy)
`depth` parameter	Yes (1, 2, 3)	No
Action doctrine injection	Yes (directives, tactics, guidelines)	No
Reference filtering by action	Yes	Limited
`CharterContextResult.depth` field	Yes	No

Switching callers would change live prompt behavior. The reroute is Phase 1 work.

Phase 0 call-site status (for documentation, not action):

Caller	Current import	Phase 1 action
`src/specify_cli/next/prompt_builder.py:13`	`specify_cli.charter.context` (legacy)	Reroute to `charter.context`
`src/specify_cli/cli/commands/agent/workflow.py:20`	`specify_cli.charter.context` (legacy)	Reroute to `charter.context`
`src/specify_cli/cli/commands/charter.py:13`	`charter.context` (canonical)	No change

Verification: Before/after output comparison for each call site. Both implementations should produce identical output for the same (action, depth) inputs. If they don't, that divergence must be resolved before proceeding -- it means the canonical path and the legacy path have drifted, and the invariant test would be comparing against the wrong oracle.

Invariant Test Design

The invariant test runs a matrix of (profile, action, depth) combinations:

Profiles: All shipped profiles from src/doctrine/agent_profiles/shipped/ (10 profiles). If profiles don't influence context assembly today (they may not -- the current build_charter_context doesn't take a profile parameter), the test degenerates to action-only and documents this as a known gap for Phase 4 to fill.
Actions: specify, plan, implement, review (4 actions with indices). tasks is tested only for DRG output (no legacy baseline exists).
Depths: 1, 2, 3 (matching the depth semantics in src/charter/context.py).

Comparison method: For each combination, both paths resolve a set of artifact URNs. Reachability parity means the URN sets are equal. If sets differ, the test checks the accepted-differences ledger. Unregistered differences fail the test. Note: this tests artifact reachability, not rendered text. The legacy path may render differently (it lacks action-doctrine sections, guidelines, etc.) -- that is expected and irrelevant to Phase 0's scope. Phase 1 will test rendered-text parity when it reroutes callers.

Matrix size: Up to 10 profiles x 4 actions x 3 depths = 120 combinations. With profile degeneration, this reduces to 4 x 3 = 12. Either size is well within the 60s CI budget (NFR-003).

Existing Code Preserved

Module	What happens	Why
`src/doctrine/service.py`	Untouched	DRG is an additional index, not a replacement (C-002)
`src/doctrine/directives/repository.py` (and siblings)	Untouched	Repositories continue to load artifacts; DRG adds graph edges
`src/doctrine/curation/`	Untouched	Phase 1 excises this (C-003)
`src/doctrine/*/_ proposed/`	Untouched	Phase 1 deletes these (C-003)
`src/specify_cli/glossary/`	Untouched	Vocabulary edges reference scopes but don't alter internals (C-005)
`src/specify_cli/charter/context.py`	Untouched	Legacy compatibility surface; callers NOT rerouted in Phase 0
`src/specify_cli/next/prompt_builder.py`	Untouched	Import path NOT changed in Phase 0; reroute is Phase 1
`src/specify_cli/cli/commands/agent/workflow.py`	Untouched	Import path NOT changed in Phase 0; reroute is Phase 1
`src/charter/context.py`	Extended	`build_context_v2()` added; existing `build_charter_context()` preserved
All inline reference fields in YAMLs	Preserved	Phase 1 removes them after parity confirmed (C-001)

Work Package Dependency Graph

WP00 (call-site audit) ────────────────────────────────┐
                                                        │
WP01 (DRG schema + model) ─┐                           │
                            │                           │
                            ├── WP02 (migration +       │
                            │        calibration)       │
                            │       │                   │
                            └───────┴── WP03 (context_v2)
                                        │               │
                                        ├───────────────┤
                                        │               │
                                        ├── WP04 (invariant test)
                                        │
                                        └── WP05 (calibration test)

Critical path: WP01 -> WP02 -> WP03 -> WP04 Parallel opportunity: WP00 runs in parallel with WP01-WP03 (it produces documentation, not code)

WP00: Call-Site Audit and Oracle Confirmation (FR-001)

Goal: Document the behavioral delta between the two build_charter_context() implementations and confirm the canonical path (src/charter/context.py) is the correct parity oracle for WP04. No production code is changed.

Produces:

A behavioral delta document listing exactly what each implementation renders for each (action, depth)
Confirmation that the canonical path's artifact resolution is the correct oracle
Documentation of what the Phase 1 reroute will change in live prompt behavior

Acceptance:

Delta document exists and covers all 4 bootstrap actions at depths 1, 2, 3
Canonical path confirmed as oracle (no unexpected artifact resolution behavior)
Phase 1 reroute scope documented with expected behavior changes

Risk: If the canonical path has a bug that resolves wrong artifacts, the invariant test will use a faulty oracle. The audit must verify artifact resolution correctness, not just accept the canonical path uncritically.

WP01: DRG Schema and Pydantic Model (FR-002, FR-003) -- #470

Goal: Define the DRG schema, implement Pydantic models, and validation logic.

New files:

src/doctrine/drg/__init__.py
src/doctrine/drg/models.py
src/doctrine/drg/loader.py
src/doctrine/drg/validator.py
tests/doctrine/drg/test_models.py
tests/doctrine/drg/test_loader.py
tests/doctrine/drg/test_validator.py
tests/doctrine/drg/conftest.py (fixtures)

Acceptance:

Pydantic model loads a fixture graph and rejects malformed shapes
Validator catches: dangling references, unknown relation types, malformed URNs, cycles in requires
Layer merge (shipped + project) works correctly
90%+ coverage, mypy --strict clean

WP02: Migration Extractor and Surface Calibration (FR-004, FR-005) -- #473

Goal: Walk shipped artifacts, extract inline references, emit graph.yaml, and apply per-action surface calibration.

New files:

src/doctrine/drg/migration/__init__.py
src/doctrine/drg/migration/extractor.py
src/doctrine/drg/migration/calibrator.py
src/doctrine/drg/migration/id_normalizer.py
src/doctrine/graph.yaml (generated output)
src/doctrine/missions/software-dev/actions/tasks/index.yaml (new action index)
tests/doctrine/drg/migration/test_extractor.py
tests/doctrine/drg/migration/test_calibrator.py
tests/doctrine/drg/migration/test_id_normalizer.py

Depends on: WP01 (models and validation)

Acceptance:

graph.yaml validates against DRG model with zero errors
Edge count >= sum of all inline reference fields across shipped artifacts
Calibration inequalities hold: specify < plan < implement, tasks < implement, review ≈ implement
Migration is idempotent: running it twice produces the same graph
No inline reference fields are modified (C-001)

WP03: build_context_v2 (FR-006, FR-009) -- #471

Goal: Implement build_context_v2(profile, action, depth) in src/charter/context.py using DRG query primitives.

New files:

src/doctrine/drg/query.py
tests/doctrine/drg/test_query.py

Modified files:

src/charter/context.py (add build_context_v2 function)

Depends on: WP01 (models), WP02 (populated graph.yaml)

Acceptance:

Function exists with correct signature
Unit tests against fixture graphs return deterministic results
No per-action filtering logic in the function body (FR-009); context size is determined entirely by graph topology
Composes DRG primitives from src/doctrine/drg/query.py; does not embed graph traversal logic

WP04: Invariant Regression Test (FR-007, FR-010) -- #472

Goal: Compare the artifact reachability of build_context_v2 against the canonical build_charter_context() for all (profile, action, depth) combinations. This tests that the DRG resolves the same governance artifacts, not that it renders identical text.

New files:

tests/charter/test_context_parity.py
tests/charter/fixtures/accepted_differences.yaml

Depends on: WP00 (oracle confirmed), WP03 (build_context_v2 exists)

Acceptance:

Test runs in CI on PRs touching src/doctrine/, src/charter/, or graph.yaml
Either passes (artifact-set identity) or produces itemized reachability differences report
Accepted-differences ledger follows the exception rules (Guardrail 2)
< 10% of matrix entries in the ledger; otherwise mission pauses
Full matrix completes in < 60s (NFR-003)

WP05: Surface Calibration Test (FR-008, FR-009, FR-010) -- #474

Goal: Assert minimum-effective-dose inequalities for every shipped action.

New files:

tests/charter/test_surface_calibration.py

Depends on: WP03 (build_context_v2 to measure surfaces)

Acceptance:

Test asserts: |specify| < |plan| < |implement|, |tasks| < |implement|, |review| ≈ |implement|
≈ defined as: review surface is within 80% of implement surface (configurable threshold)
Violations produce a clear error message naming the violating action and the expected vs actual sizes
Fix for violations: adjust scope edges in graph.yaml, never add filtering logic
Runs in CI alongside WP04

Risks and Mitigations

Risk	Mitigation	WP affected
Canonical path resolves wrong artifacts (faulty oracle)	WP00 audit verifies canonical path's artifact resolution is correct, not just accepted uncritically	WP00
Directive ID normalization misses edge cases	Reuse existing `_normalize_directive_id()` logic; extraction test asserts edge count >= inline field count	WP02
`tasks` action index doesn't exist; creating it is a calibration judgment call	Start with scope edges borrowed from `plan` + light `implement` subset; calibration test enforces inequality	WP02
Invariant test matrix is too coarse (profiles don't affect context today)	Document profile dimension as degenerate; Phase 4 profile executor will make it meaningful	WP04
`graph.yaml` lifecycle confusion (generated vs hand-edited)	Lifecycle is explicit: generated during Phase 0, authoritative post-Phase 0. See graph.yaml Lifecycle section.	WP02

Rollback Plan

1. WP00 rollback: Delete audit document. No production code was changed. 2. WP01-WP05 rollback: Delete src/doctrine/drg/, src/doctrine/graph.yaml, remove build_context_v2 from src/charter/context.py, delete new test files. Inline references remain in place (C-001), so the legacy path continues to work unchanged. No production call sites were modified. 3. Trigger: If invariant test reveals > 10% artifact-reachability divergence and root cause is unclear, pause the mission and escalate.