Research: Pre-Doctrine Test Stabilization
Phase 0 output | Mission 01KSMG8Y | 2026-05-27
This document consolidates findings from the pre-mission cross-examination of upstream/main against the 01KSF9HJ triage documentation. All root-cause confirmations were made by reading source files and running targeted Python expressions against the current codebase.
FR-001 / #1302 — TOML escape bug
Decision: Fix the source template at line 168; do not change the rendering engine.
Root cause confirmed:
# Confirmed by render_command_template() invocation:
# gemini: TOML ERROR: Unescaped '\' in a string (at line 146, column 68)
# qwen: TOML ERROR: Unescaped '\' in a string (at line 146, column 68)
Source line 168 of src/specify_cli/missions/software-dev/command-templates/implement.md:
CHANGED_PY=$(git diff --name-only --diff-filter=AMR HEAD | rg '\.py$' || true)
The \. in the rg pattern contains a literal backslash. When rendered into a TOML multi-line basic string (used by gemini and qwen formats), unescaped backslashes are invalid.
Fix: Replace rg '\.py$' with grep -E '[.]py$' (character class avoids the backslash entirely).
Rationale: grep -E '[.]py$' is universally available and contains no backslash character, eliminating the TOML escape problem. The \. form — even with grep -E — still contains a literal backslash which is illegal in a TOML multi-line basic string. The character-class form [.] matches exactly the same input but avoids the backslash entirely. rg is also not universally installed; the fallback || true already handles the case where a command is absent. Switching to grep -E '[.]py$' removes both the dependency and the TOML escape problem.
Alternatives considered:
Markdown agents that render the template literally.
string quoting; the fix must be in the template content, not the renderer.
- Double-escape in template (
rg '\\\\.py$') — rejected: produces correct TOML but breaks - TOML literal strings (
'''...''') — rejected: the template renderer does not control TOML
Snapshot refresh: After fixing the template, run:
PYTEST_UPDATE_SNAPSHOTS=1 pytest tests/specify_cli/regression/ -v
All 13 non-migrated agents will produce a diff in their baseline; the diff should be identical across all agents (only the rg → grep -E substitution).
FR-002 / #1308 — README Governance layer section
Decision: Add the section directly to README.md; no new files needed.
Root cause confirmed: grep '## Governance' README.md → 0 matches.
Test expectations — all six tests in tests/specify_cli/docs/test_readme_governance.py: 1. test_governance_section_exists — heading ## Governance layer present in README.md 2. test_trail_model_linked — docs/trail-model.md linked within the section 3. test_host_surface_parity_linked — docs/host-surface-parity.md linked within the section 4. test_governance_section_mentions_commands — the substrings spec-kitty advise, spec-kitty ask, and spec-kitty do all appear within the section 5. test_advise_skill_links_resolve — every relative .md link in .agents/skills/spec-kitty.advise/SKILL.md resolves to an existing file 6. test_runtime_next_skill_links_resolve — every relative .md link in src/doctrine/skills/spec-kitty-runtime-next/SKILL.md resolves to an existing file
Content guidance: The section must reference the trail model and host-surface parity docs AND include the three command references (spec-kitty advise, spec-kitty ask, spec-kitty do). Tests 5 and 6 are link-integrity checks on existing skill files — they pass as long as those files have no broken relative links, independently of what is written in README.md. Implementer must verify tests 5 and 6 by running them before and after editing README.md; if they fail before the edit, there is a pre-existing link-rot in a skill file that must be fixed separately (file a DIR-013 issue).
FR-003 / #1309 — Frontmatter lane regression in wp_files.py
Decision: Replace frontmatter.get("lane") with lane_reader.get_wp_lane().
Root cause confirmed (src/specify_cli/audit/classifiers/wp_files.py:92):
lane = frontmatter.get("lane") or frontmatter.get("status")
This is a Phase-2 regression — frontmatter lane was retired as the authority in 3.0 / mission 060. The canonical read is specify_cli.status.lane_reader.get_wp_lane().
Fix surface: wp_files.py:92 — replace the two frontmatter.get() calls with a guarded call to get_wp_lane(feature_dir, wp_id). The feature_dir and wp_id must be derivable from the file path context already available in the classifier.
Critical: classify_wp_files() has a "never raises" contract. get_wp_lane() raises CanonicalStatusNotFoundError for missions that have no status.events.jsonl (pre-3.0 missions, or missions that have not yet run finalize-tasks). The fix must guard against this. Recommended pattern:
from specify_cli.status.lane_reader import get_wp_lane
from specify_cli.status.store import has_event_log
from specify_cli.status.models import CanonicalStatusNotFoundError
# inside classify_wp_files():
if has_event_log(feature_dir):
try:
lane = get_wp_lane(feature_dir, wp_id)
except CanonicalStatusNotFoundError:
lane = None
else:
lane = None
An alternative using only try/except (without the has_event_log pre-check) is also acceptable, as long as CanonicalStatusNotFoundError is caught and lane falls back to None.
Tests:
wp_files.py must be clean after the fix.
mission directory that contains WP files but has no status.events.jsonl (simulating a pre-3.0 or unfinalised mission). This directly tests the "never raises" contract.
tests/specify_cli/test_lane_regression_guard.pyparameterises over source files;- A new test must verify that
classify_wp_files()does not raise when called on a
FR-004 / #1310 (partial) — Doctrine CLI group still registered
Decision: Remove the doctrine group registration; leave the doctrine.py file on disk.
Root cause confirmed (src/specify_cli/cli/commands/__init__.py):
- Line 40:
from . import doctrine as doctrine_module - Line 78:
app.add_typer(doctrine_module.app, name="doctrine", help="Manage org-layer doctrine packs")
The original removal was committed to by mission excise-doctrine-curation-and-inline-references-01KP54J6 (Phase 1, WP01 per tests/specify_cli/cli/test_doctrine_cli_removed.py docstring). The re-registration is a regression.
Fix: Remove lines 40 and 78. The doctrine.py module may remain; it is not imported elsewhere. No downstream breakage expected — charter remains registered separately.
Risk: None identified. The doctrine.py module registering a Typer sub-app is self-contained.
FR-005 / #1304 — Doctrine / glossary anchor drift
Decision: Add missing anchors and fix tactic schema in-place.
Root cause (per 01KSF9HJ triage):
- Two missing glossary anchors:
doctrine-packandplatform-darwin--platform-linux five-paradigm-parallel-debugging.tactic.yaml: schema invalid + unresolved refs
Investigation needed (WP03 implementer):
pytest tests/doctrine/test_glossary_link_integrity.py -v --tb=long 2>&1 | head -60
pytest tests/doctrine/test_tactic_compliance.py -v --tb=long 2>&1 | head -60
Output will name the exact context YAML files and which fields are unresolved.
Known tactic file: src/doctrine/tactics/built-in/five-paradigm-parallel-debugging.tactic.yaml exists — fix is in-place schema correction.
FR-006 / #1306 — Status / lifecycle event drift
Decision: Four targeted fixes; each is independent.
| Sub-issue | Surface | Fix direction |
|---|---|---|
SpecifyStarted not emitted | src/specify_cli/core/mission_creation.py or emit.py | Emit the event at the correct call site |
| Atomic commit leaves artifacts dirty | src/specify_cli/git/ (atomic commit helpers) | Ensure status artifact is committed atomically |
| Wrong commit message on lane branch | src/specify_cli/tasks/move_task.py | Trace the commit message propagation path |
implement does not block on alloc failure | src/specify_cli/cli/commands/implement.py | Add the guard |
Investigation needed (WP04 implementer): run each failing test with --tb=long before editing production code to identify the exact divergence point.
FR-007 / #1307 — Charter integration suite regressions
Decision: Six independent integration fixes; each requires test-driven investigation.
Investigation needed (WP05 implementer): run each test in isolation:
pytest tests/integration/test_charter_lint_lints_all_layers.py -v --tb=long
pytest tests/integration/test_charter_synthesize_fresh.py::test_synthesize_without_charter_md_fails_actionably -v --tb=long
pytest tests/integration/test_documentation_runtime_walk.py::test_full_advancement_through_six_actions -v --tb=long
pytest tests/integration/test_implement_review_retrospect_smoke.py::test_reject_fix_next_retrospect_smoke -v --tb=long
pytest tests/integration/test_rejection_cycle.py::test_implement_uses_review_cycle_artifact_after_review_claim -v --tb=long
pytest tests/integration/test_specify_plan_commit_boundary.py::test_setup_plan_commits_substantive_plan -v --tb=long
Integration tests are slow; use -x (fail-fast) when debugging one at a time.
FR-008 / #1305 — next CLI exit-code regressions
Decision: Fix exit-code propagation in the runtime bridge; do not change the Decision model.
Root cause (per 01KSF9HJ triage): decide_next_via_runtime in src/specify_cli/next/runtime_bridge.py returns a Decision object; the exit-code mapping (if decision.kind == "blocked": raise typer.Exit(1)) lives in src/specify_cli/cli/commands/next_cmd.py. The exit-code mapping itself is not broken; the regression is that decide_next_via_runtime returns a wrong Decision.kind value for terminal states, OR that mocks for decide_next are no longer invoked (call-path bypass). The implementer must not change the exit-code mapping in next_cmd.py — fix only the return value of decide_next_via_runtime in runtime_bridge.py and/or the mock target.
Investigation needed (WP06 implementer):
pytest tests/next/test_next_command_integration.py tests/next/test_query_mode_unit.py -v --tb=long
Read the mock setup in each test to identify the current patch target and confirm whether the mock is actually being hit. If the mock target is runtime_bridge.decide_next, it may need to be changed to runtime_bridge.decide_next_via_runtime or the internal function it delegates to.
FR-009 / #1301 — Shared-package events drift residual
Decision: Fix the six residual items in-place without upgrading the spec_kitty_events package version.
Context: 01KSF9HJ WP02 fixed the primary cascade (~130 failures) by aligning the installed package version. The six residual items in #1301 are structural issues that survived the version fix.
Items and fix directions:
| Item | Surface | Fix direction |
|---|---|---|
restart.py daemon-allowlist | tests/sync/test_daemon_intent_gate.py allowlist | Add src/specify_cli/sync/restart.py to the allowlist or refactor the unauthorized call |
BuildRegistered not queued at init | src/specify_cli/sync/ init path | Emit the event at init |
MissionOriginBound not queued without WebSocket | src/specify_cli/sync/ | Queue to offline queue when WebSocket is absent |
WPCreated missing actor/wp_title | tests/contract/test_handoff_fixtures.py fixture | Add fields to the fixture |
| Vendored events tree re-appeared | src/specify_cli/spec_kitty_events/ | Delete the directory |
YAML codeblock missing # pydantic_model: | Contract example fixture YAML | Add frontmatter comment |
FR-010 / #1303 — Charter synthesizer determinism
Decision: Fix manifest hash computation to be deterministic; add path_guard.py enforcement.
Root cause: Synthesizer manifest hashes are computed from file traversal order, which may vary across runs. Fix: sort all file lists before hashing.
Investigation needed (WP08 implementer): identify which hash computation in the synthesizer uses non-deterministic traversal, then verify the fix by running the test suite twice and confirming identical hash output.
FR-011 / #1310 (remainder) — Misc debt
Decision: Fix five in-scope items; re-defer two with new issues.
| Item | Status | Fix direction |
|---|---|---|
Auth exit-code (test_refresh_through_transport returning 2) | Fix in-scope | Trace exit-code propagation in src/specify_cli/auth/ |
JSON output noise (logged_out_on_connected_teamspace) | Fix in-scope | Find the print/echo call and guard with --json flag check |
mypy --strict on executor.py | Fix in-scope | Run mypy --strict src/specify_cli/mission_step_contracts/executor.py and fix errors |
| Legacy kitty-specs WP Pydantic validation | Fix in-scope | Either fix the 6 legacy WP files or add to the validator's exclude list with rationale |
| Mission switching blocked | Fix in-scope | Run pytest tests/missions/test_mission_switching_integration.py -v --tb=long |
spec-kitty.checklist skill package missing | Re-defer | File new sub-issue; requires template work outside scope |
| Schema-version wording | Re-defer | File new sub-issue; minor UX change |
Test-mark inventory for touched directories
| Directory | Current CI job | Mark required |
|---|---|---|
tests/specify_cli/regression/ | kernel (inferred) | pytest.mark.unit |
tests/specify_cli/docs/ | kernel (inferred) | pytest.mark.unit |
tests/specify_cli/test_lane_regression_guard.py | kernel | pytest.mark.unit |
tests/specify_cli/cli/ | fast-tests-cli | pytest.mark.fast |
tests/doctrine/ | fast-tests-doctrine | pytest.mark.doctrine or pytest.mark.fast |
tests/integration/ | doctrine integration (inferred) | pytest.mark.integration |
tests/next/ | fast-tests-next | pytest.mark.fast |
tests/sync/ | fast-tests-sync | pytest.mark.fast |
tests/contract/ | kernel (inferred) | pytest.mark.contract |
tests/agent/ | fast-tests-agent | pytest.mark.fast |
Action for WP10: for each directory, confirm the current pytestmark in every test_*.py file matches the expected mark for its CI job. Files missing a mark get one added; files with a wrong primary mark get it corrected.