Pre-Doctrine Test Stabilization

Mission ID: 01KSMG8Y0V5V2ZM4YEQMKFWN2K Mission slug: pre-doctrine-test-stabilization-01KSMG8Y Mission type: software-dev Target branch: feat/pre-doctrine-stabilization-remediation Parent triage: #1298 / mission test-stabilization-and-debt-pass-01KSF9HJ Sub-issues addressed: #1301, #1302, #1303, #1304, #1305, #1306, #1307, #1308, #1309, #1310


Overview

The test-stabilization-and-debt-pass-01KSF9HJ (01KSF9HJ) mission completed triage and approved 12 work packages addressing a 249-failure pytest baseline. All 12 WPs reached approved status but none of the lane code was ever merged into upstream/main. The 10 DIR-013 sub-issues (#1301–#1310) were filed for pre-existing clusters that 01KSF9HJ explicitly deferred, and they remain open and unresolved in the current codebase.

Cross-examination of the current main confirmed four issues are immediately verifiable:

IssueConfirmed broken behaviour
#1302render_command_template for gemini/qwen raises TOMLDecodeError at line 146 col 68 — unescaped \ from rg '\.py$' in implement.md:168
#1308README.md has no ## Governance layer section; four tests in test_readme_governance.py fail
#1309audit/classifiers/wp_files.py:92 reads frontmatter.get("lane") — Phase-2 (3.x) frontmatter lane regression
#1310 (partial)commands/__init__.py:78 still registers doctrine via add_typer; test_doctrine_parent_group_is_unregistered fails with assert 0 != 0

The remaining six clusters (#1301, #1303–#1307) are structurally confirmed by the 01KSF9HJ triage document and require per-cluster targeted fixes.

This mission resolves all ten sub-issues in four ordered waves, enforces CI test-mark hygiene across every touched directory, and declares a ≤75-failure baseline as the prerequisite gate before doctrine/charter/glossary-pack feature work resumes.


Stakeholder Context

Primary driver: Doctrine and charter feature missions cannot proceed reliably while the test suite has ~200+ unexplained failures — the noise-to-signal ratio makes it impossible to distinguish regressions from pre-existing debt.

Secondary driver: CI test-mark gaps mean some tests are silently excluded from fast CI runs. Fixing marks restores confidence that green CI actually reflects green code.

Release gate: This mission is scoped under the 3.2.x stabilization effort. Its outcome is a prerequisite for issuing a stable 3.2.0 release and for resuming active feature development on doctrine/charter subsystems.


User Scenarios & Testing

Primary scenario — Developer runs the full test suite and sees a clean baseline

1. Developer clones the repo and runs PWHEADLESS=1 pytest tests/ -q --tb=no. 2. The command completes with ≤75 failures (down from ~194–249 before this mission). 3. All 10 DIR-013 sub-issues are either closed (fix landed) or explicitly re-deferred with a filed follow-on issue and rationale.

Secondary scenario — CI fast-run picks up every new test

1. Developer writes a new unit test for a bug fix, tagging it [pytest.mark.fast]. 2. The CI fast-tests-* job for the relevant module runs it and it passes. 3. No new test is silently excluded because of a missing or incorrect mark.

Edge scenario — Pre-existing failure discovered during implementation

1. Implementer encounters a failure outside this mission's scope. 2. Per DIR-013, a GitHub issue is filed before the failure is treated as accepted baseline context. 3. The failure is not fixed within this mission's scope unless it belongs to #1301–#1310.


Functional Requirements

Wave A — Zero-risk quick fixes

IDRequirementStatus
FR-001The source template src/specify_cli/missions/software-dev/command-templates/implement.md must be updated so that the rendered TOML output for gemini and qwen agents parses without TOMLDecodeError. The fix must address the unescaped backslash in the rg '\.py$' bash snippet at source line 168 by replacing it with a backslash-free equivalent (e.g. grep -E '[.]py$' or rg '[.]py$'). The twelve-agent parity snapshot must be refreshed after the fix. Closes #1302.Proposed
FR-002README.md must gain a ## Governance layer subsection satisfying all six assertions in tests/specify_cli/docs/test_readme_governance.py: (1) heading ## Governance layer present; (2) link to docs/trail-model.md; (3) link to docs/host-surface-parity.md; (4) substrings spec-kitty advise, spec-kitty ask, and spec-kitty do all present within the section; (5) every relative .md link in .agents/skills/spec-kitty.advise/SKILL.md resolves to an existing file; (6) every relative .md link in src/doctrine/skills/spec-kitty-runtime-next/SKILL.md resolves to an existing file. Closes #1308.Proposed
FR-003src/specify_cli/audit/classifiers/wp_files.py must not read the frontmatter lane or status keys directly. The replacement must use specify_cli.status.lane_reader.get_wp_lane() wrapped in a guard (either has_event_log() pre-check or try/except CanonicalStatusNotFoundError) so that the classifier does not raise for pre-3.0 missions or missions that have not yet run finalize-tasks. The classify_wp_files() function's "never raises" contract must be preserved. The test_lane_regression_guard test for wp_files.py must pass, and a new test must confirm classify_wp_files() does not raise when called on a mission directory with WP files but no event log. Closes #1309 (frontmatter lane regression).Proposed
FR-004The doctrine top-level CLI group must not be registered in the root Typer application. The import and add_typer call in src/specify_cli/cli/commands/__init__.py must be removed. All three assertions in tests/specify_cli/cli/test_doctrine_cli_removed.py must pass. Closes #1310 (doctrine group registration part).Proposed

Wave B — Structural fixes

IDRequirementStatus
FR-005The doctrine glossary must include the anchors doctrine-pack and platform-darwin--platform-linux in the contexts where test_glossary_link_integrity expects them. The five-paradigm-parallel-debugging tactic schema must be valid and all referenced terms must resolve. All four tests.doctrine.test_glossary_link_integrity and test_tactic_compliance failures must be eliminated. Closes #1304.Proposed
FR-006Status and lifecycle event emission must be restored to the correct contract: (a) SpecifyStarted event must be emitted at mission-create time (#1067 regression); (b) atomic commit flow must not leave status artifacts dirty after move_task; (c) move_task must not surface the wrong commit message to the lane branch; (d) implement must block when lane allocation fails. The four affected tests in test_atomic_status_commits_unit, test_mission_creation_specify_started, test_move_task_git_validation_unit, and test_status_emit_on_alloc_failure must pass. Closes #1306.Proposed
FR-007The charter integration suite must be restored: (a) org-layer source name must appear in lint output; (b) synthesize_without_charter_md must surface the correct error class; (c) the discover action in the runtime walk must not block when spec.md is already authored; (d) the implement-review-retrospect smoke must pass; (e) the rejection cycle must report the correct branch in the handoff; (f) the specify-plan commit boundary test must observe auto-commit of a substantive plan. All six affected integration tests must pass. Closes #1307.Proposed
FR-008The next CLI exit-code contract must match the documented spec: terminal states return exit 0; blocked states return exit 1; decide_next mocks must be invoked when the code path reaches them. The four failing tests in test_next_command_integration and test_query_mode_unit must pass. Closes #1305.Proposed

Wave C — Shared-package and architectural fixes

IDRequirementStatus
FR-009The shared-package events residual cluster must be resolved: (a) src/specify_cli/sync/restart.py must be added to the daemon-allowlist or refactored so the daemon-intent gate test passes; (b) BuildRegistered must be queued at init time; (c) MissionOriginBound must be queued when no WebSocket is available; (d) the WPCreated handoff fixture must carry actor and wp_title fields; (e) no vendored events tree must exist under src/specify_cli/spec_kitty_events; (f) the YAML codeblock in the example round-trip fixture must carry # pydantic_model: frontmatter. All six tests.sync and tests.contract failures listed in #1301 must pass. Closes #1301.Proposed
FR-010The charter synthesizer must produce deterministic manifest hashes. Direct write primitives must not leak outside path_guard.py. The chokepoint coverage gap must be closed. All five tests.charter.synthesizer.test_bundle_validate_extension assertions must pass. Closes #1303.Proposed
FR-011The miscellaneous debt cluster from #1310 must be resolved or formally re-deferred per issue. In-scope items for this mission: (a) auth integration exit-code must return the correct value; (b) logged_out_on_connected_teamspace noise must not leak into JSON CLI output; (c) mission_step_contracts/executor.py must be mypy-strict-clean; (d) WP files in legacy kitty-specs that fail Pydantic validation must be fixed or explicitly excluded from the validator; (e) mission switching must not be blocked. Items that cannot be fixed without a dedicated mission (e.g. schema-version wording drift, spec-kitty.checklist skill package) are re-deferred with a filed follow-on issue. Closes #1310 (partially; re-deferred items get new issues).Proposed
FR-014Test-created mission directories must never persist in kitty-specs/. Two obligations: (1) Immediate cleanup — all existing stray mission directories matching the test-feature- pattern (currently 10 directories: test-feature-01KRR081 through test-feature-01KSHQ1R) must be deleted from the repository; (2) Structural fix — every test that creates a mission directory during execution must unconditionally remove it on teardown (pass or fail) using a pytest fixture with yield-and-cleanup or equivalent. A .gitignore rule for kitty-specs/test-feature-/ must be added as a safety net so that future test-created missions cannot be accidentally staged. No test-feature-* directory may remain under kitty-specs/ after this fix lands.Proposed

Wave D — Closeout

IDRequirementStatus
FR-012Every test file in directories touched by FR-001 through FR-011 must carry a module-level pytestmark with at least one of the canonical CI-quality marks (fast, unit, integration, e2e, slow, contract, architectural, doctrine). Additionally, tests/agent/test_context_unit.py (the one currently untagged test file) must receive the appropriate mark. A new or updated guard test must verify that no test file in a touched module directory is missing a pytestmark.Proposed
FR-013A full PWHEADLESS=1 pytest tests/ -q --tb=no run on the post-mission commit must produce ≤75 failures. The run output and failure list must be committed to docs/01KSMG8Y-closeout/baseline.md. All ten parent sub-issues (#1301–#1310) must be either closed (linked to the fixing commit) or re-deferred with a new filed issue. #1298 must receive a closing comment with the final delta.Proposed

Non-Functional Requirements

IDRequirementThreshold
NFR-001Full-suite failure count after mission merge≤75 failures (pytest tests/ -q --tb=no)
NFR-002No new failures introduced by Wave A fixes0 — Wave A changes must not increase the failure count vs. the pre-fix baseline
NFR-003CI fast-run coverage of touched modulesEvery test file in a fixed module must have at least one test marked fast or unit that runs in the appropriate fast-tests-<module> CI job
NFR-004Test suite runtime stabilityThe full suite run time must not regress by more than 10% vs. the pre-mission baseline (~924s)

Constraints

IDConstraint
C-001All fixes are behaviour-preserving. No feature extension, no new CLI surfaces, no schema changes outside what is required to make existing tests pass.
C-002SPEC_KITTY_TEST_MODE=1 bypass is permitted only in ceremony-commit guards (as established in commit 64ddadc5f). It must not be introduced in production logic paths.
C-003Frontmatter lane reads are illegal in Phase-2 (3.x) runtime code. Any fix that reads lane state must use specify_cli.status.lane_reader.get_wp_lane().
C-004All meta.json mutations must route through src/specify_cli/feature_metadata/mission_metadata.py. No direct meta.json file writes in new or fixed code.
C-005The charter top-level CLI group must remain registered. FR-004 removes only the doctrine group. Removing charter is out of scope.
C-006DIR-013 applies to all WPs in this mission. Any new pre-existing failure discovered during implementation must be filed as a GitHub issue before it is treated as accepted baseline context.
C-007The stale brief files (.kittify/mission-brief.md, .kittify/ticket-context.md, .kittify/brief-source.yaml) must be deleted after this spec is committed. They belong to a different mission context (01KRJGKH4DJCSF277K9QV3WBE7).
C-008Wave C item FR-011 may re-defer items that require dedicated missions (e.g. spec-kitty.checklist skill package restoration). Each re-deferred item must have a filed follow-on GitHub issue before FR-011 is declared complete.
C-009The twelve-agent parity snapshot refresh (FR-001) must be committed alongside the template fix, not as a separate follow-on.

Assumptions

1. The uv.lock pin for spec_kitty_events is already at 5.2.0 in the current codebase. FR-009 addresses structural residual items, not the package version itself (which was aligned by 01KSF9HJ WP02 planning-lane work). 2. The doctrine glossary anchor additions (FR-005) are content additions to existing YAML files, not schema changes. No glossary schema version bump is needed. 3. The implement.md backslash fix (FR-001) applies equally to all 13 TOML-format agents rendered from the same source. The snapshot refresh will cover all affected agents in one pass. 4. "Test-mark audit" (FR-012) targets module directories touched by FR-001–FR-011. It does not require a full audit of the entire test suite beyond the one known untagged file (tests/agent/test_context_unit.py). 5. Wave ordering (A → B → C → D) is the recommended sequencing but lanes within each wave may be parallelised where write-scope permits.


Success Criteria

1. A full pytest tests/ -q --tb=no run reports ≤75 failures after mission merge. 2. All ten GitHub sub-issues (#1301–#1310) are either closed with a linked commit or re-deferred with a new filed issue. 3. #1298 receives a closing comment documenting the final delta. 4. Every CI fast-tests-<module> job that covers a module touched by this mission runs at least one test and passes. 5. tests/specify_cli/cli/test_doctrine_cli_removed.py — all three assertions pass. 6. tests/specify_cli/docs/test_readme_governance.py — all four assertions pass. 7. tests/specify_cli/regression/test_twelve_agent_parity.py::test_toml_command_output_is_parseable[implement-gemini] and [implement-qwen] pass. 8. tests/specify_cli/test_lane_regression_guard.py for src/specify_cli/audit/classifiers/wp_files.py passes. 9. The closeout document docs/01KSMG8Y-closeout/baseline.md is committed with the final failure list.


Out of Scope

  • Any new feature work (doctrine packs, charter vocabulary, glossary UI).
  • The Quality and DevEx Hardening mission (01KRJGKH4DJCSF277K9QV3WBE7, epic #822) — separate mission tracking mypy strict, Sonar gate, stale-lane auto-rebase, and upgrade-check UX.
  • Fixes for the test-stabilization-and-debt-pass-01KSF9HJ WP05–WP08 code (LD-1, MS-1, LD-3, LD-5 architectural debt) — those are approved but unmerged; they may be cherry-picked or re-implemented if they directly resolve a sub-issue, but the architectural restructuring work itself is not in scope.
  • Full Sonar quality-gate remediation (coverage uplift, hotspot review) — in scope of the separate quality/devex mission.
  • Any change to release versioning or CHANGELOG entries beyond what the test assertions require.