Implementation Plan: Pre-Doctrine Test Stabilization

Branch: feat/pre-doctrine-stabilization-remediation | Date: 2026-05-27 | Spec: spec.md Mission ID: 01KSMG8Y0V5V2ZM4YEQMKFWN2K Input: kitty-specs/pre-doctrine-test-stabilization-01KSMG8Y/spec.md


Summary

Fix all 10 DIR-013 sub-issues (#1301–#1310) from the 01KSF9HJ triage that were filed but never merged into upstream/main. Four confirmed bugs are directly verifiable in current main (TOML escape, README Governance, wp_files.py frontmatter lane, doctrine CLI group). Six clusters require targeted per-surface investigation and fix. All tests written during this mission carry CI-quality marks (fast + domain mark). Final deliverable: ≤75-failure baseline committed to docs/01KSMG8Y-closeout/baseline.md and all ten GitHub issues resolved or re-deferred.


Technical Context

Language/Version: Python 3.11+ Primary Dependencies: pytest 9.x, ruff, mypy --strict (for surfaces explicitly in scope), typer, spec-kitty CLI Storage: Filesystem only — YAML, JSON, JSONL, Markdown (no database mutations) Testing: pytest with pytestmark module-level marks; CI splits on fast, integration, e2e, architectural, contract, doctrine; new tests require [pytest.mark.fast] plus one domain mark Target Platform: Linux / macOS / Windows (cross-platform CLI tool) Project Type: Single Python project (src/specify_cli, src/charter, src/doctrine, src/kernel) Performance Goals: Full-suite run time must not regress by > 10% vs. pre-mission baseline (~924 s) Constraints: All fixes are behaviour-preserving; no feature extension; DIR-013 applies throughout; Phase-2 frontmatter-lane reads are illegal; meta.json writes must route through feature_metadata/mission_metadata.py


Charter Check

Mode: compact. Applicable directives from project charter:

DirectiveImpact on this mission
DIR-005Tests added for every changed behaviour — verified
DIR-006mypy --strict must pass for surfaces touched (executor.py, WP scope)
DIR-013New pre-existing failures encountered during implementation → file a GitHub issue before treating as baseline

No charter violations anticipated. All fixes are additive corrections to existing behaviour. The doctrine-CLI removal (FR-004) was originally committed to by mission 01KP54J6 and is being completed here — not a new architectural decision.


Project Structure

Planning artifacts

kitty-specs/pre-doctrine-test-stabilization-01KSMG8Y/
├── spec.md              ← committed
├── plan.md              ← this file
├── research.md          ← Phase 0 output
├── tasks.md             ← /spec-kitty.tasks output (not yet)
├── checklists/
│   └── requirements.md  ← committed
└── tasks/               ← /spec-kitty.tasks output (not yet)

Source touchpoints (by wave)

Wave A — quick fixes
  src/specify_cli/missions/software-dev/command-templates/implement.md   ← FR-001
  tests/specify_cli/regression/_twelve_agent_baseline/                   ← FR-001 snapshot refresh
  README.md                                                               ← FR-002
  src/specify_cli/audit/classifiers/wp_files.py                          ← FR-003
  src/specify_cli/cli/commands/__init__.py                                ← FR-004

Wave B — structural fixes
  src/doctrine/glossary/                                                   ← FR-005
  src/doctrine/tactics/built-in/five-paradigm-parallel-debugging.tactic.yaml  ← FR-005
  src/specify_cli/status/emit.py (or mission_creation.py)                ← FR-006 (SpecifyStarted)
  src/specify_cli/git/ (atomic commit flow)                               ← FR-006
  src/specify_cli/tasks/move_task.py                                      ← FR-006
  src/specify_cli/charter*/                                               ← FR-007
  tests/integration/                                                      ← FR-007
  src/specify_cli/next/decision.py (or runtime_bridge.py)                ← FR-008
  src/specify_cli/cli/commands/next_cmd.py                               ← FR-008

Wave C — shared-package / architectural
  src/specify_cli/sync/restart.py                                         ← FR-009 (allowlist)
  tests/sync/                                                             ← FR-009
  tests/contract/                                                         ← FR-009
  src/specify_cli/charter/ (synthesizer)                                  ← FR-010
  src/specify_cli/mission_step_contracts/executor.py                      ← FR-011 (mypy)
  src/specify_cli/cli/commands/invocations_cmd.py (or similar)            ← FR-011 (JSON noise)
  auth/                                                                   ← FR-011 (exit code)

Wave D — closeout
  tests/ (mark audit — all touched modules)                              ← FR-012
  docs/01KSMG8Y-closeout/                                                ← FR-013

Work Package Structure

Overview

WPWaveFRsTitleLaneParallel group
WP01AFR-001TOML escape fix + snapshot refreshlane-a0
WP02AFR-002, FR-003, FR-004README Governance + chokepoint guardslane-b0
WP03BFR-005Doctrine / glossary anchor + tactic repairlane-c0
WP04BFR-006Status / lifecycle event driftlane-d0
WP05BFR-007Charter integration suite regressionslane-e0
WP06BFR-008next CLI exit-code regressionslane-f0
WP07CFR-009Shared-package events drift residuallane-g1
WP08CFR-010Charter synthesizer determinismlane-h1
WP09CFR-011Misc debt — auth / invocation / mypy / mission switchinglane-i1
WP10DFR-012CI test-mark audit + guard testlane-j2
WP11DFR-013Full-suite re-baseline + issue closeoutlane-planning3

Dependency graph:

WP01 ──┐
WP02 ──┤
WP03 ──┤
WP04 ──┼──► WP10 (mark audit) ──► WP11 (closeout)
WP05 ──┤
WP06 ──┘
WP07 ──┐
WP08 ──┤
WP09 ──┘

WP01–WP06 (waves A + B) can run in parallel; WP07–WP09 (wave C) can run in parallel; WP10 depends on WP01–WP09 being merged; WP11 depends on WP10.

WP01 — TOML escape fix + snapshot refresh (FR-001 / #1302)

Write scope:

  • src/specify_cli/missions/software-dev/command-templates/implement.md — replace rg '\.py$' at line 168 with grep -E '[.]py$' (character-class form avoids any backslash in the rendered TOML string; grep -E is universally available without requiring rg). Do NOT use grep -E '\.py$' — that form still contains a backslash and will fail the same TOML parse check.
  • tests/specify_cli/regression/_twelve_agent_baseline/ — regenerate all affected agent snapshots via PYTEST_UPDATE_SNAPSHOTS=1 pytest tests/specify_cli/regression/ -v

Pre-condition check: Before regenerating, run ls tests/specify_cli/regression/_twelve_agent_baseline/implement/ to confirm whether the baseline covers 12 or 13 agents. The test file is test_twelve_agent_parity.py but CLAUDE.md documents 13 slash-command agents. Name the exact count in the WP handoff note so the reviewer can confirm the diff scope.

Acceptance: test_toml_command_output_is_parseable[implement-gemini] and [implement-qwen] pass; snapshot diff contains only the rggrep substitution; all other snapshot assertions still pass.

Test marks required: pytest.mark.unit + pytest.mark.fast (parity test already has pytestmark = [pytest.mark.unit])

WP02 — README Governance + chokepoint guards (FR-002, FR-003, FR-004 / #1308, #1309, #1310-partial)

Write scope:

1. Heading ## Governance layer present 2. Link to docs/trail-model.md 3. Link to docs/host-surface-parity.md 4. Substrings spec-kitty advise, spec-kitty ask, spec-kitty do all appear within the section 5. Every relative .md link in .agents/skills/spec-kitty.advise/SKILL.md resolves to an existing file (read this file first; fix any broken links if found) 6. Every relative .md link in src/doctrine/skills/spec-kitty-runtime-next/SKILL.md resolves to an existing file (read this file first; fix any broken links if found)

``python from specify_cli.status.lane_reader import get_wp_lane, has_event_log # ... if has_event_log(mission_dir): lane = get_wp_lane(mission_dir, wp_path.stem) else: lane = None # pre-3.0 / unfinalized mission — skip terminal-lane evidence check ` The classify_wp_files() function has a "never raises" contract; do NOT call get_wp_lane() without the guard — it will raise CanonicalStatusNotFoundError on any mission without status.events.jsonl`. Update imports accordingly.

  • README.md — add ## Governance layer section. The section must satisfy all six assertions in tests/specify_cli/docs/test_readme_governance.py:
  • src/specify_cli/audit/classifiers/wp_files.py:92 — replace the direct frontmatter lane read with specify_cli.status.lane_reader.get_wp_lane(), but wrapped in a guard. The required pattern:
  • src/specify_cli/cli/commands/__init__.py:40,78 — remove doctrine import and add_typer registration. The charter group (registered separately) must remain untouched. Verify that no test imports doctrine_module via register_commands and then checks the root app's command list before deleting the lines.

Acceptance:

  • All six test_readme_governance assertions pass
  • test_lane_regression_guard[src/specify_cli/audit/classifiers/wp_files.py] passes
  • New test confirms classify_wp_files(path_without_event_log) does not raise (verifies the guard)
  • All three test_doctrine_cli_removed assertions pass

Test marks required: new guard-fallback test must carry [pytest.mark.unit, pytest.mark.fast]

WP03 — Doctrine / glossary anchor + tactic repair (FR-005 / #1304)

Write scope:

  • src/doctrine/glossary/ — add anchors doctrine-pack and platform-darwin--platform-linux in the appropriate context YAML files
  • src/doctrine/tactics/built-in/five-paradigm-parallel-debugging.tactic.yaml — fix schema violations and unresolved refs

Investigation required: run pytest tests/doctrine/test_glossary_link_integrity.py tests/doctrine/test_tactic_compliance.py -v to pinpoint which context files are missing the anchors and which refs are unresolved.

Acceptance: All four failing tests in test_glossary_link_integrity and test_tactic_compliance pass.

Test marks required: existing tests already tagged with pytest.mark.doctrine

WP04 — Status / lifecycle event drift (FR-006 / #1306)

Write scope (four independent fixes): 1. SpecifyStarted event not emitted at mission-create (#1067 regression) — surface: src/specify_cli/core/mission_creation.py or src/specify_cli/status/emit.py 2. Atomic commit flow leaves status artifacts dirty after move_task — surface: src/specify_cli/git/ (atomic commit helpers) 3. Wrong commit message bubbled to lane branch — surface: src/specify_cli/tasks/move_task.py (exclusive ownership: WP05 must not touch move_task.py; if WP05 item 5 is rooted in the same function, fold it into WP04 rather than editing move_task.py from two parallel lanes) 4. implement does not block on alloc failure — surface: src/specify_cli/cli/commands/implement.py or related

Investigation required: read each failing test to identify the exact call path before editing production code.

Acceptance: test_atomic_status_commits_unit, test_mission_creation_specify_started (×2), test_move_task_git_validation_unit (the commit-message variant), and test_status_emit_on_alloc_failure all pass.

Test marks required: new tests (if any) must carry [pytest.mark.fast, pytest.mark.git_repo] or [pytest.mark.integration, pytest.mark.git_repo] as appropriate

WP05 — Charter integration suite regressions (FR-007 / #1307)

Write scope (six independent integration failures): 1. Org-layer source name missing in lint output — src/specify_cli/charter_lint/ 2. Wrong error class from synthesize_without_charter_mdsrc/specify_cli/charter/ or src/specify_cli/cli/commands/charter/ 3. discover action blocks despite spec.md authored — src/specify_cli/next/ or runtime walk 4. implement-review-retrospect smoke — cross-cutting 5. Wrong branch in rejection-cycle handoff — src/specify_cli/tasks/move_task.py or implement/review path 6. Substantive plan not auto-committed in specify-plan — src/specify_cli/cli/commands/

Investigation required: run each failing integration test individually with --tb=short to identify the minimal reproduction before touching production code.

Write-scope constraint — runtime_bridge.py: Item 3 ("discover action blocks despite spec.md authored") may require editing src/specify_cli/next/runtime_bridge.py. This file is also the primary target of WP06. Before editing it, the WP05 implementer MUST determine whether item 3 is rooted in the charter preflight logic (src/specify_cli/charter_preflight/) rather than runtime_bridge.py. If the fix is in runtime_bridge.py, WP05 must not begin that edit until WP06 has been reviewed and merged, or the two fixes must be coordinated in the same lane.

Write-scope constraint — move_task.py: Item 5 ("wrong branch in rejection-cycle handoff") may touch src/specify_cli/tasks/move_task.py. WP04 has exclusive ownership of move_task.py. If item 5 is rooted in move_task.py, fold it into WP04 rather than editing from two parallel lanes.

Acceptance: All six integration tests in test_charter_lint_lints_all_layers, test_charter_synthesize_fresh, test_documentation_runtime_walk, test_implement_review_retrospect_smoke, test_rejection_cycle, test_specify_plan_commit_boundary pass.

Test marks required: integration tests already carry [pytest.mark.integration, pytest.mark.git_repo]; any new tests follow the same pattern

WP06 — next CLI exit-code regressions (FR-008 / #1305)

Fix architecture: The exit-code mapping is already correct in src/specify_cli/cli/commands/next_cmd.py (if decision.kind == "blocked": raise typer.Exit(1); all other kinds exit 0). The fix target is src/specify_cli/next/runtime_bridge.py::decide_next_via_runtime — it is returning a Decision with the wrong kind for terminal states. The implementer must NOT change next_cmd.py's exit-code mapping; instead restore correct kind values from decide_next_via_runtime.

The test_query_mode_unit mock failure (decide_next mock not invoked) indicates a call-path bypass — the mock was likely patching the old entry point before the decide_next_via_runtime delegation was introduced. Investigate whether the test needs to patch runtime_bridge.decide_next_via_runtime directly.

Write scope:

  • src/specify_cli/next/runtime_bridge.py — restore correct Decision.kind for terminal and blocked states
  • tests/next/test_query_mode_unit.py — update mock target if the patching path changed

Investigation required: run pytest tests/next/ -v --tb=long to identify the divergence point. Confirm the current mock patch target vs. the actual import path.

Acceptance: All four failing tests in test_next_command_integration (exit codes, advancing mode) and test_query_mode_unit (mock invoked) pass.

Test marks required: new tests (if any) must carry [pytest.mark.fast, pytest.mark.unit]

WP07 — Shared-package events drift residual (FR-009 / #1301)

Write scope:

  • src/specify_cli/sync/restart.py — add to daemon-allowlist or refactor unauthorized call site
  • tests/sync/ — fix test_lifecycle_readiness (BuildRegistered queued at init) and test_event_queued_when_no_websocket (MissionOriginBound queued)
  • tests/contract/test_handoff_fixtures.py — add actor and wp_title fields to WPCreated payload
  • src/specify_cli/spec_kitty_events/ — remove vendored events tree if it re-appeared
  • Contract fixture YAML — add # pydantic_model: frontmatter to the flagged example

Investigation required: confirm current spec_kitty_events package version vs. uv.lock pin; run uv sync --frozen if mismatched before editing test code.

Acceptance: All six tests listed in #1301 pass.

Test marks required: sync tests already carry [pytest.mark.fast] or [pytest.mark.integration]; follow existing convention

WP08 — Charter synthesizer determinism (FR-010 / #1303)

Correct source path: The charter synthesizer lives at src/charter/synthesizer/ (not src/specify_cli/charter/). WP05 item 2 ("wrong error class from synthesize_without_charter_md") touches the CLI adapter in src/specify_cli/cli/commands/charter/synthesize.py — NOT src/charter/synthesizer/errors.py. WP08 must not touch the CLI adapter; WP05 must not touch src/charter/synthesizer/errors.py. This separation prevents parallel-lane conflict.

Write scope:

  • src/charter/synthesizer/manifest.py (or equivalent) — fix manifest hash determinism (sort file lists before hashing to produce deterministic traversal order)
  • src/charter/synthesizer/path_guard.py — enforce the chokepoint so direct write primitives cannot bypass it
  • Test fixtures — refresh stored manifest hashes after the determinism fix

Investigation required: run pytest tests/charter/synthesizer/ -v --tb=long to identify the exact hash computation path and confirm the path_guard.py chokepoint location.

Acceptance: All five test_bundle_validate_extension assertions pass.

Test marks required: charter tests already carry marks; new tests use [pytest.mark.fast, pytest.mark.unit]

WP09 — Misc debt — auth / invocation / mypy / mission switching (FR-011 / #1310)

Write scope (in-scope items):

  • src/specify_cli/auth/ — fix auth integration exit-code returning 2 instead of expected value; identified test: tests.auth.integration.test_refresh_through_transport
  • src/specify_cli/cli/commands/invocations_cmd.py (or related) — prevent logged_out_on_connected_teamspace noise from leaking into JSON output; identified tests: tests.specify_cli.invocation.cli.test_do, test_profiles, test_record
  • src/specify_cli/mission_step_contracts/executor.py — fix mypy --strict failures; identified test: tests.cross_cutting.test_mypy_strict_mission_step_contracts::test_mission_step_contracts_executor_is_mypy_strict_clean
  • Legacy kitty-specs/ WP files — fix or exclude from Pydantic validation the 6 WP files failing tests.specify_cli.status.test_wp_metadata::test_all_kitty_specs_wp_files_validate. Self-referential trap: by the time WP09 runs, this mission's own WP files will also be checked by the glob. Before attempting fixes, run the test to see the current failure list and confirm this mission's own WP files pass. If they do not, fix them first.
  • Mission-switching tests — identify and fix the blocking condition; identified tests: tests.missions.test_mission_switching_integration ×2

Re-defer (per C-008):

  • spec-kitty.checklist skill package restoration — file a new sub-issue; this requires dedicated template work outside this mission's scope
  • Schema-version wording drift — file a new sub-issue for a CHANGELOG-tracked fix

Acceptance (specific test IDs):

  • tests.auth.integration.test_refresh_through_transport passes
  • tests.specify_cli.invocation.cli.test_do, test_profiles, test_record pass
  • tests.cross_cutting.test_mypy_strict_mission_step_contracts::test_mission_step_contracts_executor_is_mypy_strict_clean passes
  • tests.specify_cli.status.test_wp_metadata::test_all_kitty_specs_wp_files_validate passes (or pre-existing failures are reduced to those owned by re-deferred items)
  • tests.missions.test_mission_switching_integration (both parameterizations) pass
  • Two new GitHub issues filed for re-deferred items before WP09 is closed

Test marks required: new tests carry [pytest.mark.fast, pytest.mark.unit] or [pytest.mark.integration] as appropriate

WP10 — CI test-mark audit (FR-012)

Pre-condition: WP01–WP09 must all be merged into the feature branch before WP10 begins; this WP touches every module directory that any earlier WP touched.

Write scope:

  • All test files in modules touched by WP01–WP09: verify each has a pytestmark with at least one canonical CI-quality mark
  • tests/agent/test_context_unit.py — add missing pytestmark (likely pytest.mark.fast)
  • tests/specify_cli/test_lane_regression_guard.py — add a category mark (e.g. pytest.mark.unit) alongside the existing pytest.mark.non_sandbox; the existing mark is not a CI-quality split mark and will be excluded from fast runs without a second mark

Acceptance: No test file in a touched directory is missing a CI-quality pytestmark; the existing guard tests/architectural/test_pytest_marker_convention.py::test_every_test_file_declares_a_pytestmark_marker passes without modification; tests/agent/test_context_unit.py is tagged.

No new guard test file. The architectural guard already exists at tests/architectural/test_pytest_marker_convention.py. Do not create a new file for this.

Test marks required: no new test files added in this WP; only pytestmark additions to existing files.

WP11 — Full-suite re-baseline + issue closeout (FR-013) — planning lane

Write scope (planning lane — no worktree):

  • docs/01KSMG8Y-closeout/baseline.md — record full pytest tests/ -q --tb=no output and failure list post-merge
  • GitHub issues — close #1301–#1310 with linking commits, or re-defer with new sub-issues
  • #1298 — post closing comment with final delta
  • kitty-specs/pre-doctrine-test-stabilization-01KSMG8Y/ — final status update

Gate: Failure count ≤75. If > 75, identify remaining clusters, file DIR-013 issues, document in baseline.md, and declare mission complete with the known gap and a follow-on issue.


Phase 0: Research

See research.md for consolidated findings from the pre-mission cross-examination.

Key pre-confirmed root causes requiring no further research:

  • FR-001: rg '\.py$' at implement.md:168 → unescaped \ in TOML multi-line basic string
  • FR-002: README.md has 0 occurrences of ## Governance layer
  • FR-003: wp_files.py:92frontmatter.get("lane")
  • FR-004: commands/__init__.py:78add_typer(doctrine_module.app, name="doctrine")

Items requiring targeted investigation before editing (WP03–WP09):

  • FR-005: exact anchor context files + tactic schema errors → run tests with --tb=long
  • FR-006: four independent call paths → read each failing test before editing
  • FR-007: six integration failures → run each in isolation before editing
  • FR-008: decide_next_via_runtime exit-code propagation path
  • FR-009: daemon-allowlist entries + payload schema for WPCreated
  • FR-010: synthesizer hash computation source
  • FR-011: auth exit-code source, JSON output filtering point, mypy errors in executor.py

Phase 1: Design Notes

This mission is purely corrective. There are no new data models, API contracts, or external-facing schemas. The design notes capture invariants that each implementer must respect:

Invariant: Phase-2 lane authority (C-003)

No code path may read frontmatter.get("lane") or frontmatter.get("status") as the WP lane value. The canonical read is specify_cli.status.lane_reader.get_wp_lane(feature_dir, wp_id). Any test that was previously asserting frontmatter lane values must be updated to assert against the event-log-derived lane.

Invariant: chokepoint routing (C-004)

All meta.json mutations must pass through src/specify_cli/feature_metadata/mission_metadata.py. Any code path that writes a meta.json key directly (e.g., Path(...) / "meta.json").write_text(...)) is a regression.

Invariant: test-mark contract (FR-012)

Every test_*.py file in a directory touched by this mission must have:

pytestmark = [pytest.mark.<primary_mark>]
# or
pytestmark = [pytest.mark.<primary_mark>, pytest.mark.<secondary_mark>]

Where <primary_mark> is one of: fast, unit, integration, e2e, slow, contract, architectural, doctrine.

Tests that require a real git repository also add pytest.mark.git_repo. Tests that spawn subprocesses or touch the network add pytest.mark.non_sandbox.

Doctrine CLI removal safety (FR-004)

Removing doctrine from add_typer in __init__.py must not be confused with the charter group (which must remain). The doctrine.py command module file may remain on disk (it is imported nowhere after the deregistration); deleting it is optional and out of scope.


Gates

GateCondition
Spec committed + substantive✅ — committed at 587f2c3da
No NEEDS CLARIFICATION markers✅ — decision verify returned status: clean
Plan committed + substantivePending — this document
All WP11 baseline gates≤75 failures, all sub-issues resolved

Branch Contract (repeated per protocol)

  • Current branch at plan start: feat/pre-doctrine-stabilization-remediation
  • Planning/base branch: feat/pre-doctrine-stabilization-remediation
  • Final merge target: feat/pre-doctrine-stabilization-remediation
  • branch_matches_target: true

Next step: /spec-kitty.tasks to generate WP files and finalize lanes.