Implementation Plan: DRG Phase Zero
Branch: main | Date: 2026-04-13 | Spec: spec.md Input: Feature specification from kitty-specs/drg-phase-zero-01KP2YCE/spec.md Mission ID: 01KP2YCESBSG61KQH5PQZ9662H
Summary
Build the Doctrine Reference Graph (DRG) as a YAML-based graph model that makes implicit doctrine references explicit and queryable. Reroute call sites to the canonical context builder. Implement build_context_v2() by composing DRG query primitives. Prove parity via invariant regression test and enforce minimum-effective-dose surface calibration.
Technical Context
Language/Version: Python 3.11+ Primary Dependencies: Pydantic (model/validation), ruamel.yaml (YAML I/O), typer/rich (CLI if needed) Storage: Filesystem only (graph.yaml in src/doctrine/, test fixtures in tests/) Testing: pytest, mypy --strict, 90%+ coverage for new code Target Platform: CLI tool (cross-platform, no OS-specific code) Project Type: Single project (monorepo for spec-kitty) Performance Goals: Graph load < 500ms, context query < 200ms, full test matrix < 60s Constraints: No cross-repo changes, inline references must remain in place (C-001) Scale/Scope: ~80 nodes, ~200 edges (current shipped doctrine), 4 actions x ~10 profiles
Charter Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
| Gate | Status | Notes |
|---|---|---|
| DIRECTIVE_003 (Decision Documentation) | Pass | DRG schema is an ADR-worthy decision; graph.yaml IS the documentation |
| DIRECTIVE_010 (Specification Fidelity) | Pass | Invariant test (FR-007) proves fidelity to spec |
| 90%+ test coverage | Committed | NFR-004 |
| mypy --strict | Committed | NFR-005 |
| Integration tests for CLI commands | N/A | No new CLI commands in this mission |
Project Structure
Documentation (this feature)
kitty-specs/drg-phase-zero-01KP2YCE/
├── spec.md # Mission specification
├── plan.md # This file
├── data-model.md # DRG schema entities
├── checklists/
│ └── requirements.md # Spec quality checklist
└── tasks/ # Work packages (created by /spec-kitty.tasks)
Source Code (new and modified)
src/doctrine/
├── graph.yaml # [NEW] The DRG data file (generated by migration)
└── drg/ # [NEW] DRG infrastructure package
├── __init__.py # Public API: DRGGraph, load_graph, validate_graph, query
├── models.py # Pydantic models: DRGNode, DRGEdge, DRGGraph
├── loader.py # load_graph(path) -> DRGGraph, merge_layers(shipped, project)
├── validator.py # validate_graph(): dangling refs, cycles, malformed URNs
├── query.py # walk_edges(), resolve_context(): edge traversal primitives
└── migration/
├── __init__.py
├── extractor.py # walk_artifacts() -> list[DRGEdge]
├── calibrator.py # apply_surface_calibration() -> adjusted scope edges
└── id_normalizer.py # normalize directive IDs (DIRECTIVE_NNN <-> NNN-slug)
src/charter/
└── context.py # [MODIFIED] Add build_context_v2() composing DRG primitives
src/specify_cli/
├── next/prompt_builder.py # [MODIFIED] Reroute import to src/charter/context.py
└── cli/commands/agent/workflow.py # [MODIFIED] Reroute import to src/charter/context.py
tests/doctrine/drg/ # [NEW] DRG unit tests
├── __init__.py
├── conftest.py # Shared fixtures: sample graph.yaml, malformed graphs
├── test_models.py # Pydantic model validation
├── test_loader.py # Graph loading and layer merging
├── test_validator.py # Dangling refs, cycles, URN format
├── test_query.py # Edge traversal, depth limiting
└── migration/
├── __init__.py
├── test_extractor.py # Extraction from shipped artifacts
├── test_calibrator.py # Surface calibration adjustments
└── test_id_normalizer.py # ID format normalization
tests/charter/
├── test_context_parity.py # [NEW] Invariant regression test (FR-007)
├── test_surface_calibration.py # [NEW] Calibration inequality test (FR-008)
└── fixtures/
└── accepted_differences.yaml # [NEW] Exception ledger (empty by default)
Architecture
Layer Separation (Guardrail 1)
The DRG package (src/doctrine/drg/) is strictly doctrine-graph infrastructure. It owns:
- Schema/model: Pydantic models for nodes, edges, and the graph document
- Loading/validation: Parse
graph.yaml, validate integrity, merge shipped + project layers - Query primitives: Walk edges by relation type, depth-limited traversal, transitive closure
It does NOT own:
- Charter-specific assembly policy (
profile x action x depthexpansion rules) - Action-scoped rendering (directive/tactic line formatting, reference filtering)
- Governance resolution or project selection intersection
Charter-specific assembly lives in src/charter/context.py. build_context_v2() composes DRG primitives:
build_context_v2(profile, action, depth)
│
├── drg.loader.load_graph() # Load and merge graph layers
├── drg.query.walk_edges( # Walk scope edges to depth 1
│ start="action:{mission}/{action}",
│ relations=["scope"],
│ max_depth=1)
│ # (applies is v1 schema but not populated in Phase 0)
├── drg.query.walk_edges( # Walk requires transitively
│ start=<resolved nodes>,
│ relations=["requires"],
│ max_depth=None) # Transitive closure
├── drg.query.walk_edges( # Walk suggests to user depth
│ start=<resolved nodes>,
│ relations=["suggests"],
│ max_depth=depth)
├── drg.query.walk_edges( # Collect vocabulary edges
│ start=<resolved nodes>,
│ relations=["vocabulary"])
│
└── _materialize_artifacts(nodes) # Charter-layer: load, format, render
This keeps Phase 1 replacement local: build_charter_context() gets deleted, build_context_v2() becomes build_context(), all in src/charter/context.py. The DRG package is untouched.
Accepted-Differences Ledger (Guardrail 2)
tests/charter/fixtures/accepted_differences.yaml is an exception ledger, not a design escape hatch.
Rules:
1. Empty by default. If the migration and calibration are correct, this file stays empty. 2. Every entry must include: the exact (profile, action, depth) case, the legacy artifact set, the DRG artifact set, a concrete reason why the difference is intentional, and a follow-up issue number if the difference should be resolved later. 3. No "expected drift" entries. If a difference exists because the DRG path is correct and the legacy path was wrong-sized, the entry must say so with evidence. "We expect some drift" is not an acceptable reason. 4. Threshold gate: If more than 10% of the test matrix has accepted differences, Phase 0 is not done. The migration extractor or calibrator must be fixed.
Schema:
# tests/charter/fixtures/accepted_differences.yaml
schema_version: "1.0"
entries: []
# Each entry:
# - profile: "implementer"
# action: "implement"
# depth: 2
# legacy_artifacts: ["DIRECTIVE_024", "DIRECTIVE_025", ...]
# drg_artifacts: ["DIRECTIVE_024", "DIRECTIVE_025", "DIRECTIVE_030", ...]
# reason: "Legacy path missed DIRECTIVE_030 because action index slug '030-...' failed normalization"
# follow_up_issue: null # or "#NNN" if this should be fixed
# accepted_by: "robert"
# accepted_at: "2026-04-15"
graph.yaml Lifecycle
graph.yaml has two distinct lifecycle phases:
1. Phase 0 (bootstrap): The migration extractor generates graph.yaml from inline references + calibration adjustments. The migration is idempotent: same inputs produce the same output. During Phase 0, the migration is the source of truth; graph.yaml is a derived artifact.
2. Post-Phase 0 (authoritative): Once Phase 1 deletes inline references, graph.yaml becomes the authoritative source for governance wiring. Calibration edits and new edges go directly into graph.yaml. The migration is no longer re-run (its inputs no longer exist).
This means: during Phase 0, calibration failures are fixed by adjusting the calibrator inputs (action index files) and regenerating. Post-Phase 0, calibration failures are fixed by editing graph.yaml directly. The spec's "DRG is the only knob" principle holds in both phases -- only the editing mechanism changes.
Migration Extraction Strategy
The extractor walks three artifact categories and emits edges into graph.yaml:
1. Artifact-to-artifact edges (from inline reference fields):
| Source artifact | Field | Edge relation | Target kind |
|---|---|---|---|
| Directive | tactic_refs | requires | tactic |
| Directive | references[type=directive] | requires | directive |
| Directive | references[type=tactic] | suggests | tactic |
| Directive | references[type=styleguide] | suggests | styleguide |
| Tactic | references[type=tactic] | suggests | tactic |
| Tactic | references[type=styleguide] | suggests | styleguide |
| Paradigm | tactic_refs | requires | tactic |
| Paradigm | directive_refs | requires | directive |
| Paradigm | opposed_by | replaces | paradigm/tactic |
Edge metadata: when field (from tactic references), reason field (from opposed_by).
2. Action-to-artifact edges (from action index files):
| Source | Field | Edge relation | Target kind |
|---|---|---|---|
| Action node | directives | scope | directive |
| Action node | tactics | scope | tactic |
| Action node | styleguides | scope | styleguide |
| Action node | toolguides | scope | toolguide |
| Action node | procedures | scope | procedure |
Action nodes use URN format action:{mission}/{action} (e.g., action:software-dev/specify).
3. ID normalization: The extractor must normalize directive IDs between the two formats:
- Action indices use slug format:
024-locality-of-change - Directive YAMLs define:
DIRECTIVE_024 - DRG canonical form:
directive:DIRECTIVE_024(URN uses the YAML-defined ID) - The existing
_normalize_directive_id()insrc/charter/context.pyalready handles this; the extractor reuses the same logic viasrc/doctrine/drg/migration/id_normalizer.py.
Surface Calibration
Current measured action surfaces (from shipped action indices):
| Action | Directives | Tactics | Toolguides | Total refs |
|---|---|---|---|---|
| specify | 2 | 1 | 0 | 3 |
| plan | 2 | 2 | 0 | 4 |
| implement | 6 | 6 | 1 | 13 |
| review | 2 | 3 | 0 | 5 |
| tasks | (no index) | (no index) | (no index) | 0 |
Required calibration inequalities:
|context(specify)| < |context(plan)| < |context(implement)| -- currently: 3 < 4 < 13 ✓
|context(tasks)| < |context(implement)| -- currently: 0 < 13 ✓ (vacuously)
|context(review)| ≈ |context(implement)| -- currently: 5 vs 13 ✗
Calibration actions needed:
1. tasks action: Create src/doctrine/missions/software-dev/actions/tasks/index.yaml with appropriate scope edges. tasks should be lighter than implement but heavier than plan (it needs planning context plus some implementation awareness). Estimated: ~6-8 total refs.
2. review action: The ≈ relation means review should see roughly the same governance surface as implement, because reviewers need the same context as implementers to judge correctness. The current review surface (5) needs additional scope edges to approach implement (13). The calibrator adds scope edges for the directives and tactics that review should share with implement.
3. Surface measurement: Surface size is measured as the count of distinct artifacts reachable from the action node via scope edges (depth 1), plus transitive requires closure. Token estimates are a secondary metric derived from materializing the artifacts.
Call-Site Audit (NOT Reroute)
Phase 0 does NOT reroute any production call sites. The two implementations have different behavior:
| Feature | src/charter/context.py (canonical) | src/specify_cli/charter/context.py (legacy) |
|---|---|---|
depth parameter | Yes (1, 2, 3) | No |
| Action doctrine injection | Yes (directives, tactics, guidelines) | No |
| Reference filtering by action | Yes | Limited |
CharterContextResult.depth field | Yes | No |
Switching callers would change live prompt behavior. The reroute is Phase 1 work.
Phase 0 call-site status (for documentation, not action):
| Caller | Current import | Phase 1 action |
|---|---|---|
src/specify_cli/next/prompt_builder.py:13 | specify_cli.charter.context (legacy) | Reroute to charter.context |
src/specify_cli/cli/commands/agent/workflow.py:20 | specify_cli.charter.context (legacy) | Reroute to charter.context |
src/specify_cli/cli/commands/charter.py:13 | charter.context (canonical) | No change |
Verification: Before/after output comparison for each call site. Both implementations should produce identical output for the same (action, depth) inputs. If they don't, that divergence must be resolved before proceeding -- it means the canonical path and the legacy path have drifted, and the invariant test would be comparing against the wrong oracle.
Invariant Test Design
The invariant test runs a matrix of (profile, action, depth) combinations:
- Profiles: All shipped profiles from
src/doctrine/agent_profiles/shipped/(10 profiles). If profiles don't influence context assembly today (they may not -- the currentbuild_charter_contextdoesn't take a profile parameter), the test degenerates to action-only and documents this as a known gap for Phase 4 to fill. - Actions:
specify,plan,implement,review(4 actions with indices).tasksis tested only for DRG output (no legacy baseline exists). - Depths: 1, 2, 3 (matching the depth semantics in
src/charter/context.py).
Comparison method: For each combination, both paths resolve a set of artifact URNs. Reachability parity means the URN sets are equal. If sets differ, the test checks the accepted-differences ledger. Unregistered differences fail the test. Note: this tests artifact reachability, not rendered text. The legacy path may render differently (it lacks action-doctrine sections, guidelines, etc.) -- that is expected and irrelevant to Phase 0's scope. Phase 1 will test rendered-text parity when it reroutes callers.
Matrix size: Up to 10 profiles x 4 actions x 3 depths = 120 combinations. With profile degeneration, this reduces to 4 x 3 = 12. Either size is well within the 60s CI budget (NFR-003).
Existing Code Preserved
| Module | What happens | Why |
|---|---|---|
src/doctrine/service.py | Untouched | DRG is an additional index, not a replacement (C-002) |
src/doctrine/directives/repository.py (and siblings) | Untouched | Repositories continue to load artifacts; DRG adds graph edges |
src/doctrine/curation/ | Untouched | Phase 1 excises this (C-003) |
src/doctrine/*/_ proposed/ | Untouched | Phase 1 deletes these (C-003) |
src/specify_cli/glossary/ | Untouched | Vocabulary edges reference scopes but don't alter internals (C-005) |
src/specify_cli/charter/context.py | Untouched | Legacy compatibility surface; callers NOT rerouted in Phase 0 |
src/specify_cli/next/prompt_builder.py | Untouched | Import path NOT changed in Phase 0; reroute is Phase 1 |
src/specify_cli/cli/commands/agent/workflow.py | Untouched | Import path NOT changed in Phase 0; reroute is Phase 1 |
src/charter/context.py | Extended | build_context_v2() added; existing build_charter_context() preserved |
| All inline reference fields in YAMLs | Preserved | Phase 1 removes them after parity confirmed (C-001) |
Work Package Dependency Graph
WP00 (call-site audit) ────────────────────────────────┐
│
WP01 (DRG schema + model) ─┐ │
│ │
├── WP02 (migration + │
│ calibration) │
│ │ │
└───────┴── WP03 (context_v2)
│ │
├───────────────┤
│ │
├── WP04 (invariant test)
│
└── WP05 (calibration test)
Critical path: WP01 -> WP02 -> WP03 -> WP04 Parallel opportunity: WP00 runs in parallel with WP01-WP03 (it produces documentation, not code)
WP00: Call-Site Audit and Oracle Confirmation (FR-001)
Goal: Document the behavioral delta between the two build_charter_context() implementations and confirm the canonical path (src/charter/context.py) is the correct parity oracle for WP04. No production code is changed.
Produces:
- A behavioral delta document listing exactly what each implementation renders for each (action, depth)
- Confirmation that the canonical path's artifact resolution is the correct oracle
- Documentation of what the Phase 1 reroute will change in live prompt behavior
Acceptance:
- Delta document exists and covers all 4 bootstrap actions at depths 1, 2, 3
- Canonical path confirmed as oracle (no unexpected artifact resolution behavior)
- Phase 1 reroute scope documented with expected behavior changes
Risk: If the canonical path has a bug that resolves wrong artifacts, the invariant test will use a faulty oracle. The audit must verify artifact resolution correctness, not just accept the canonical path uncritically.
WP01: DRG Schema and Pydantic Model (FR-002, FR-003) -- #470
Goal: Define the DRG schema, implement Pydantic models, and validation logic.
New files:
src/doctrine/drg/__init__.pysrc/doctrine/drg/models.pysrc/doctrine/drg/loader.pysrc/doctrine/drg/validator.pytests/doctrine/drg/test_models.pytests/doctrine/drg/test_loader.pytests/doctrine/drg/test_validator.pytests/doctrine/drg/conftest.py(fixtures)
Acceptance:
- Pydantic model loads a fixture graph and rejects malformed shapes
- Validator catches: dangling references, unknown relation types, malformed URNs, cycles in
requires - Layer merge (shipped + project) works correctly
- 90%+ coverage, mypy --strict clean
WP02: Migration Extractor and Surface Calibration (FR-004, FR-005) -- #473
Goal: Walk shipped artifacts, extract inline references, emit graph.yaml, and apply per-action surface calibration.
New files:
src/doctrine/drg/migration/__init__.pysrc/doctrine/drg/migration/extractor.pysrc/doctrine/drg/migration/calibrator.pysrc/doctrine/drg/migration/id_normalizer.pysrc/doctrine/graph.yaml(generated output)src/doctrine/missions/software-dev/actions/tasks/index.yaml(new action index)tests/doctrine/drg/migration/test_extractor.pytests/doctrine/drg/migration/test_calibrator.pytests/doctrine/drg/migration/test_id_normalizer.py
Depends on: WP01 (models and validation)
Acceptance:
graph.yamlvalidates against DRG model with zero errors- Edge count >= sum of all inline reference fields across shipped artifacts
- Calibration inequalities hold: specify < plan < implement, tasks < implement, review ≈ implement
- Migration is idempotent: running it twice produces the same graph
- No inline reference fields are modified (C-001)
WP03: build_context_v2 (FR-006, FR-009) -- #471
Goal: Implement build_context_v2(profile, action, depth) in src/charter/context.py using DRG query primitives.
New files:
src/doctrine/drg/query.pytests/doctrine/drg/test_query.py
Modified files:
src/charter/context.py(addbuild_context_v2function)
Depends on: WP01 (models), WP02 (populated graph.yaml)
Acceptance:
- Function exists with correct signature
- Unit tests against fixture graphs return deterministic results
- No per-action filtering logic in the function body (FR-009); context size is determined entirely by graph topology
- Composes DRG primitives from
src/doctrine/drg/query.py; does not embed graph traversal logic
WP04: Invariant Regression Test (FR-007, FR-010) -- #472
Goal: Compare the artifact reachability of build_context_v2 against the canonical build_charter_context() for all (profile, action, depth) combinations. This tests that the DRG resolves the same governance artifacts, not that it renders identical text.
New files:
tests/charter/test_context_parity.pytests/charter/fixtures/accepted_differences.yaml
Depends on: WP00 (oracle confirmed), WP03 (build_context_v2 exists)
Acceptance:
- Test runs in CI on PRs touching
src/doctrine/,src/charter/, orgraph.yaml - Either passes (artifact-set identity) or produces itemized reachability differences report
- Accepted-differences ledger follows the exception rules (Guardrail 2)
- < 10% of matrix entries in the ledger; otherwise mission pauses
- Full matrix completes in < 60s (NFR-003)
WP05: Surface Calibration Test (FR-008, FR-009, FR-010) -- #474
Goal: Assert minimum-effective-dose inequalities for every shipped action.
New files:
tests/charter/test_surface_calibration.py
Depends on: WP03 (build_context_v2 to measure surfaces)
Acceptance:
- Test asserts:
|specify| < |plan| < |implement|,|tasks| < |implement|,|review| ≈ |implement| ≈defined as: review surface is within 80% of implement surface (configurable threshold)- Violations produce a clear error message naming the violating action and the expected vs actual sizes
- Fix for violations: adjust scope edges in
graph.yaml, never add filtering logic - Runs in CI alongside WP04
Risks and Mitigations
| Risk | Mitigation | WP affected |
|---|---|---|
| Canonical path resolves wrong artifacts (faulty oracle) | WP00 audit verifies canonical path's artifact resolution is correct, not just accepted uncritically | WP00 |
| Directive ID normalization misses edge cases | Reuse existing _normalize_directive_id() logic; extraction test asserts edge count >= inline field count | WP02 |
tasks action index doesn't exist; creating it is a calibration judgment call | Start with scope edges borrowed from plan + light implement subset; calibration test enforces inequality | WP02 |
| Invariant test matrix is too coarse (profiles don't affect context today) | Document profile dimension as degenerate; Phase 4 profile executor will make it meaningful | WP04 |
graph.yaml lifecycle confusion (generated vs hand-edited) | Lifecycle is explicit: generated during Phase 0, authoritative post-Phase 0. See graph.yaml Lifecycle section. | WP02 |
Rollback Plan
1. WP00 rollback: Delete audit document. No production code was changed. 2. WP01-WP05 rollback: Delete src/doctrine/drg/, src/doctrine/graph.yaml, remove build_context_v2 from src/charter/context.py, delete new test files. Inline references remain in place (C-001), so the legacy path continues to work unchanged. No production call sites were modified. 3. Trigger: If invariant test reveals > 10% artifact-reachability divergence and root cause is unclear, pause the mission and escalate.