Tasks: Review/Merge Gate Hardening (3.2.x)

Mission: review-merge-gate-hardening-3-2-x-01KRC57C | Date: 2026-05-12 Branch: fix/3.2.x-review-merge-gate-hardening Spec: spec.md | Plan: plan.md

8 work packages, 43 subtasks. Target prompt size 200–500 lines per WP; sized to 4–7 subtasks each.

Subtask Index (reference table; not a tracking surface)

IDDescriptionWPParallel
T001Audit and classify every uv run pytest reference (docs, templates, skill renderer) as release-gate or developer-convenienceWP01[P]
T002Replace release-gate references with uv run python -m pytest; leave dev convenience as-isWP01
T003Add preflight helper module asserting pytest is importable in project venvWP01[P]
T004Wire preflight into gate invocation paths; emit MISSION_REVIEW_TEST_EXTRA_MISSINGWP01
T005Regression test: subprocess fixture with venv lacking test extra; assert diagnostic firesWP01
T006Add or confirm filelock dependency in pyproject.tomlWP02[D]
T007Wrap .pytest_cache/spec-kitty-test-venv creation in conftest with file lock + 60s timeoutWP02
T008On lock-acquire timeout, emit diagnostic naming the lock file path so operator can cleanWP02[D]
T009Regression test: parallel contract + architectural suites complete deterministicallyWP02
T010Document the fixture's concurrency contract in tests/README.md or similarWP02[D]
T011Create src/specify_cli/cli/commands/review/__init__.py package; re-export review_mission public symbolWP07
T012Extract Step 1 (WP lane check) into review/_lane_gate.pyWP07
T013Extract Step 2 (dead-code scan) into review/_dead_code.pyWP07[P]
T014Extract Step 3 (BLE001 audit) into review/_ble001_audit.pyWP07[P]
T015Extract Step 4 (report writer) into review/_report.pyWP07[P]
T016Delete the original review.py; verify existing review tests pass unchanged (NFR-005)WP07
T017Add review/_diagnostics.py with MissionReviewDiagnostic StrEnum; class docstring references ERROR_CODES.mdWP03[P]
T018Implement mode resolution + --mode flag in review/_mode.py; mode-mismatch diagnostic body with 3 remediation options (FR-005, FR-006, FR-023)WP03
T019Implement review/_issue_matrix.py validator: mandatory + named-optional columns, alias normalization, verdict allow-list, single-table rule, body-cell rules (FR-006, FR-028–FR-031)WP03
T020Extend review/_report.py for new frontmatter (mode, gates_recorded, issue_matrix_present, mission_exception_present); record GateRecord per gate (FR-007)WP03
T021Remediate the 6 existing issue-matrix.md files on main: auto-normalize trivial drift with provenance note; surface structural drift via diagnostic (FR-032)WP03
T022Author src/specify_cli/cli/commands/review/ERROR_CODES.md with one section per MissionReviewDiagnostic member (FR-033, NFR-008)WP03[P]
T023Add glossary entries for: lightweight mode, post-merge mode, mode mismatch, issue-matrix schema drift (FR-034)WP03[P]
T024Add mission_number_baked: bool = False to MergeState; ensure JSON serializer carries itWP04
T025Implement idempotency check in _run_lane_based_merge: read meta.json.mission_number before assignment; skip + set flag when equal (FR-010, FR-011)WP04
T026Update merge --resume flow to short-circuit when mission_number_baked == TrueWP04
T027Regression test simulating partial-merge resume scenario: assert no empty mission-number commit; flag flips to True (FR-012)WP04
T028Add get_status_read_root() helper preferring current worktree (FR-013)WP05[P]
T029Audit every get_main_repo_root() call site in status-read code paths; tag read-only vs read-write; route read-only ones through the new helperWP05
T030Implement fail-loud path: if a status command cannot serve from a detached worktree, emit diagnostic naming the constraint (FR-014)WP05
T031Regression fixture with two worktrees + divergent status.events.jsonl; from each worktree, agent tasks status --json reflects its own events (FR-015)WP05
T032Promote charset-normalizer to direct dep in pyproject.toml pinned >=3.4,<4; verify uv.lock resolves cleanlyWP06[P]
T033Create src/charter/_diagnostics.py with CharterEncodingDiagnostic StrEnum; class docstring references ERROR_CODES.mdWP06[P]
T034Create src/charter/_io.py exposing load_charter_file() and load_charter_bytes(); detection order per contract; provenance write to dual-storage routing (FR-016, FR-017, FR-022)WP06
T035Retrofit 3 ingest sites: compiler.py:594, sync.py:151, interview.py:283,398 (NFR-004 budget)WP06
T036Implement --unsafe bypass on the loader; record bypass_used: true in provenance (FR-021)WP06
T037Author src/charter/ERROR_CODES.md (FR-033, NFR-008); add glossary entries for encoding chokepoint, encoding provenance, unsafe bypass (FR-034)WP06[P]
T038Regression tests covering cp1252 ingest, mixed-content AMBIGUOUS, --unsafe bypass, per-mission vs centralized routingWP06
T039Create spec-kitty migrate charter-encoding subcommand at src/specify_cli/cli/commands/migrate/charter_encoding.pyWP08
T040Implement corpus scan: inventory kitty-specs//charter/ + .kittify/charter/* (extensions .yaml, .md, .txt); detect encoding via WP06 chokepoint per fileWP08
T041Implement interactive mode (prompt before each non-UTF-8 file) + --dry-run and --yes flagsWP08[D]
T042Implement idempotency check: re-running on already-normalized corpus is a no-op (NFR-006)WP08
T043JSON-stable summary report (FR-027); regression test for legacy mission migration + idempotency checkWP08

Work Packages

WP01 — Hermetic mission-review gate invocation

Goal: Mission-review gate commands cannot fall through to a globally installed pytest. Replace bare uv run pytest … with hermetic invocation in release-gate contexts; add a preflight assertion that fails fast if the project .venv lacks the test extra.

Priority: P1 — blocks the cross-surface fixture harness (#992 Phase 0) and the fresh-clone smoke from epic #822's stable-release-gate.

Independent test: Fresh shallow clone with uv sync (no --extra test); run documented mission-review gate command; verify exit non-zero with MISSION_REVIEW_TEST_EXTRA_MISSING and never invokes /opt/homebrew/bin/pytest or any system pytest.

Source bug: #987

Dependencies: WP07 (preflight helper lives in the post-refactor review/ package neighborhood, but at a sibling path not owned by WP07 to avoid overlap)

Estimated prompt size: ~300 lines

Subtasks
  • ✅ T001 [P] Audit and classify every uv run pytest reference (docs, templates, skill renderer) as release-gate or developer-convenience (WP01)
  • ✅ T002 Replace release-gate references with uv run python -m pytest; leave dev convenience as-is (WP01)
  • ✅ T003 [P] Add preflight helper module asserting pytest is importable in project venv (WP01)
  • ✅ T004 Wire preflight into gate invocation paths; emit MISSION_REVIEW_TEST_EXTRA_MISSING (WP01)
  • ✅ T005 Regression test: subprocess fixture with venv lacking test extra; assert diagnostic fires (WP01)
Risks
  • A doc/template change is missed → another agent reproduces the bug post-fix. Mitigated by T001's classification pass producing an audit table that becomes part of the PR.
Parallel opportunities

T001 and T003 in parallel (different files); T002 follows T001; T004 follows T003; T005 follows T002 and T004.

Prompt: tasks/WP01-hermetic-gate-invocation.md


WP02 — Concurrency-safe pytest-venv fixture

Goal: The shared .pytest_cache/spec-kitty-test-venv fixture creation is safe to run from parallel pytest invocations (contract + architectural gates). Add a file lock around creation; emit operator-actionable diagnostics on timeout.

Priority: P1 — release-gate parallelization is documented behavior; false hard-gate failures undermine release confidence.

Independent test: Concurrent execution of tests/contract/ and tests/architectural/ in two parallel pytest processes; both complete deterministically without ensurepip errors or partial-venv observations.

Source bug: #986

Dependencies: none

Estimated prompt size: ~280 lines

Subtasks
  • ✅ T006 [P] Add or confirm filelock dependency in pyproject.toml (WP02)
  • ✅ T007 Wrap .pytest_cache/spec-kitty-test-venv creation in conftest with file lock + 60s timeout (WP02)
  • ✅ T008 [P] On lock-acquire timeout, emit diagnostic naming the lock file path so operator can clean (WP02)
  • ✅ T009 Regression test: parallel contract + architectural suites complete deterministically (WP02)
  • ✅ T010 [P] Document the fixture's concurrency contract in tests/README.md or similar (WP02)
Risks
  • Lock timeout too aggressive → flaky CI under load. Mitigated by 60s window and stale-lock detection. If 60s is insufficient on slow CI runners, tune up; do not remove the lock.
Parallel opportunities

T006, T008, T010 are parallelizable (different files / no shared state). T007 sequenced after T006. T009 sequenced after T007.

Prompt: tasks/WP02-pytest-venv-concurrency.md


WP07 — review.py hygiene refactor

Goal: Split src/specify_cli/cli/commands/review.py into a package of sibling files (commands/review/ + _lane_gate.py, _dead_code.py, _ble001_audit.py, _report.py). Mechanical extraction only — no new abstractions, no domain modeling, no behavior change (NFR-005). Prerequisite to WP03 so the contract enforcement code lands into a clean structure.

Priority: P1 — sequencing prerequisite. Doing this after WP03 means rebuilding WP03 against a refactor.

Independent test: Run the existing review test suite against the post-WP07 codebase with no test modifications; every test passes (same inputs → same outputs → same exit codes → same artifacts).

Source: HiC directive 2026-05-12 (WP03 Q3 resolution)

Dependencies: none

Estimated prompt size: ~350 lines

Subtasks
  • ✅ T011 Create src/specify_cli/cli/commands/review/__init__.py package; re-export review_mission public symbol (WP07)
  • ✅ T012 Extract Step 1 (WP lane check) into review/_lane_gate.py (WP07)
  • ✅ T013 [P] Extract Step 2 (dead-code scan) into review/_dead_code.py (WP07)
  • ✅ T014 [P] Extract Step 3 (BLE001 audit) into review/_ble001_audit.py (WP07)
  • ✅ T015 [P] Extract Step 4 (report writer) into review/_report.py (WP07)
  • ✅ T016 Delete the original review.py; verify existing review tests pass unchanged (NFR-005) (WP07)
Risks
  • Scope creep: an implementer is tempted to "improve while moving" — fix a small bug or rename a function. NFR-005 forbids any behavior change. Implementer agent must resist; any improvement opportunity is a follow-up ticket, not WP07 scope.
Parallel opportunities

T013, T014, T015 in parallel. T012 sequenced after T011 (Step 1 uses helpers exported from __init__.py). T016 last.

Prompt: tasks/WP07-review-py-refactor.md


WP03 — Mission-review mode contract, validator, and existing-matrix remediation

Goal: Add --mode {lightweight|post-merge} with auto-detect default; enforce the audit-derived issue-matrix.md schema; record Gate 1–4 with command/exit_code/result; remediate the 6 existing matrices on main. Ship JSON-stable diagnostics with remediation guidance.

Priority: P0 — primary release-blocker fix.

Independent test: (1) Post-merge mission lacking issue-matrix.md fails with MISSION_REVIEW_ISSUE_MATRIX_MISSING. (2) Lightweight invocation writes a report whose body explicitly says "Lightweight consistency check; not a release gate". (3) Mode-mismatch invocation fails with MISSION_REVIEW_MODE_MISMATCH and body lists three remediation options.

Source bug: #985 + HiC audit directive 2026-05-12

Dependencies: WP01 (hermetic gates), WP02 (parallel-safe fixtures), WP07 (refactored review/ package)

Estimated prompt size: ~480 lines

Subtasks
  • ✅ T017 [P] Add review/_diagnostics.py with MissionReviewDiagnostic StrEnum; class docstring references ERROR_CODES.md (WP03)
  • ✅ T018 Implement mode resolution + --mode flag in review/_mode.py; mode-mismatch diagnostic body with 3 remediation options (FR-005, FR-006, FR-023) (WP03)
  • ✅ T019 Implement review/_issue_matrix.py validator: mandatory + named-optional columns, alias normalization, verdict allow-list, single-table rule, body-cell rules (FR-006, FR-028–FR-031) (WP03)
  • ✅ T020 Extend review/_report.py for new frontmatter (mode, gates_recorded, issue_matrix_present, mission_exception_present); record GateRecord per gate (FR-007) (WP03)
  • ✅ T021 Remediate the 6 existing issue-matrix.md files on main: auto-normalize trivial drift with provenance note; surface structural drift via diagnostic (FR-032) (WP03)
  • ✅ T022 [P] Author src/specify_cli/cli/commands/review/ERROR_CODES.md with one section per MissionReviewDiagnostic member (FR-033, NFR-008) (WP03)
  • ✅ T023 [P] Add glossary entries for: lightweight mode, post-merge mode, mode mismatch, issue-matrix schema drift (FR-034) (WP03)
Risks
  • The existing 6 matrices have varied shapes (see spec §Existing-mission audit findings). Auto-normalization MUST write a one-line provenance note inside the file before modifying any content; never silent rewrite. Implementer must inspect each file and decide per the contract.
  • Diagnostic-code namespace drift: if the StrEnum and ERROR_CODES.md get out of sync, NFR-008 fires. T022 produces the doc and T024 (in tests) asserts the cross-reference holds.
Parallel opportunities

T017, T022, T023 in parallel (different files/concerns). T018 + T019 sequential within mode-then-validator chain. T020 after T017 (uses Diagnostic enum) and T019 (writes validator result). T021 last.

Prompt: tasks/WP03-mission-review-contract.md


WP04 — Idempotent mission-number assignment

Goal: Resume of a partial-merge after mission_number was committed must succeed without empty mission-number commits. Add mission_number_baked flag to merge-state; short-circuit assignment when already-equal.

Priority: P1 — recovery story for release-blocking missions.

Independent test: Simulate partial-merge (mission_number=N written, then merge fails); rerun merge --resume; verify no empty commit; assignment step skipped; flag is True.

Source bug: #983

Dependencies: none

Estimated prompt size: ~260 lines

Subtasks
  • ✅ T024 Add mission_number_baked: bool = False to MergeState; ensure JSON serializer carries it (WP04)
  • ✅ T025 Implement idempotency check in _run_lane_based_merge: read meta.json.mission_number before assignment; skip + set flag when equal (FR-010, FR-011) (WP04)
  • ✅ T026 Update merge --resume flow to short-circuit when mission_number_baked == True (WP04)
  • ✅ T027 Regression test simulating partial-merge resume scenario: assert no empty mission-number commit; flag flips to True (FR-012) (WP04)
Risks
  • Subtle concurrency: the mission-number lock must still serialize concurrent merges. The idempotency check happens INSIDE the lock; do not move it outside.
Parallel opportunities

T024 first; T025 + T026 sequential; T027 last.

Prompt: tasks/WP04-mission-number-idempotency.md


WP05 — Worktree-aware status read resolution

Goal: spec-kitty agent tasks status from a detached worktree reads that worktree's status.events.jsonl, not the primary checkout's. Add get_status_read_root() helper; route read-only paths through it.

Priority: P1 — post-merge SHA verification is core to release-gate trust.

Independent test: Two-worktree fixture with divergent event logs; from each worktree, agent tasks status --json reflects its own events.

Source bug: #984

Dependencies: none

Estimated prompt size: ~270 lines

Subtasks
  • ✅ T028 [P] Add get_status_read_root() helper preferring current worktree (FR-013) (WP05)
  • ✅ T029 Audit every get_main_repo_root() call site in status-read code paths; tag read-only vs read-write; route read-only ones through the new helper (WP05)
  • ✅ T030 Implement fail-loud path: if a status command cannot serve from a detached worktree, emit diagnostic naming the constraint (FR-014) (WP05)
  • ✅ T031 Regression fixture with two worktrees + divergent status.events.jsonl; from each worktree, agent tasks status --json reflects its own events (FR-015) (WP05)
Risks
  • Audit miss: a write path is mistakenly routed through the new helper. Mitigated by explicit tagging in T029 and code review for "any function that writes to status MUST keep main-repo-root resolution".
Parallel opportunities

T028 in parallel with audit prep; T029 sequenced after T028 (uses the helper); T030 + T031 sequential.

Prompt: tasks/WP05-status-read-worktree-resolution.md


WP06 — Charter encoding chokepoint

Goal: Single ingestion chokepoint at three boundaries (compiler.py, sync.py, interview.py) with charset-normalizer-backed detection, provenance recording in dual storage, --unsafe bypass with audit, and fail-loud on ambiguity. NFR-004 caps the module budget at 5.

Priority: P2 — addresses #644 within #822's narrowed-slice constraint.

Independent test: Ingest a cp1252-encoded charter; assert provenance recorded with normalization_applied=True; ingest a mixed-content file and assert CHARTER_ENCODING_AMBIGUOUS; same with --unsafe succeeds and records bypass_used=true.

Source bug: #644 (narrowed)

Dependencies: none

Estimated prompt size: ~450 lines

Subtasks
  • ✅ T032 [P] Promote charset-normalizer to direct dep in pyproject.toml pinned >=3.4,<4; verify uv.lock resolves cleanly (WP06)
  • ✅ T033 [P] Create src/charter/_diagnostics.py with CharterEncodingDiagnostic StrEnum; class docstring references ERROR_CODES.md (WP06)
  • ✅ T034 Create src/charter/_io.py exposing load_charter_file() and load_charter_bytes(); detection order per contract; provenance write to dual-storage routing (FR-016, FR-017, FR-022) (WP06)
  • ✅ T035 Retrofit 3 ingest sites: compiler.py:594, sync.py:151, interview.py:283,398 (NFR-004 budget) (WP06)
  • ✅ T036 Implement --unsafe bypass on the loader; record bypass_used: true in provenance (FR-021) (WP06)
  • ✅ T037 [P] Author src/charter/ERROR_CODES.md (FR-033, NFR-008); add glossary entries for encoding chokepoint, encoding provenance, unsafe bypass (FR-034) (WP06)
  • ✅ T038 Regression tests covering cp1252 ingest, mixed-content AMBIGUOUS, --unsafe bypass, per-mission vs centralized routing (WP06)
Risks
  • Scope creep: implementer notices the 5 deferred re-read sites also use read_text(encoding="utf-8") and "fixes" them as a courtesy. NFR-004's >5-module guardrail explicitly forbids this. Implementer must STOP and escalate if more than 5 modules would be touched.
  • charset-normalizer detection on small files (< 100 bytes) may misclassify. Detection order checks BOM and strict UTF-8 first; only ambiguous content falls to charset-normalizer.
Parallel opportunities

T032, T033, T037 in parallel. T034 follows T032/T033 (uses both). T035 follows T034. T036 modifies T034's loader. T038 last.

Prompt: tasks/WP06-charter-encoding-chokepoint.md


WP08 — Charter-content encoding migration flow

Goal: Scan every existing mission's charter content for non-UTF-8 encodings; normalize-or-fail-loud preemptively so WP06's chokepoint doesn't surface false regressions on legacy files. Idempotent (NFR-006); JSON-stable summary report.

Priority: P2 — operational hygiene; runs in CI to keep the chokepoint trustworthy.

Independent test: Synthesize a legacy mission directory with a cp1252 charter file; run spec-kitty migrate charter-encoding; verify the file is normalized with provenance and a second invocation is a no-op.

Source: HiC directive 2026-05-12 (WP06 Q2 resolution)

Dependencies: WP06 (uses the chokepoint's detection + provenance)

Estimated prompt size: ~320 lines

Subtasks
  • ✅ T039 Create spec-kitty migrate charter-encoding subcommand at src/specify_cli/cli/commands/migrate/charter_encoding.py (WP08)
  • ✅ T040 Implement corpus scan: inventory kitty-specs//charter/ + .kittify/charter/* (extensions .yaml, .md, .txt); detect encoding via WP06 chokepoint per file (WP08)
  • ✅ T041 [P] Implement interactive mode (prompt before each non-UTF-8 file) + --dry-run and --yes flags (WP08)
  • ✅ T042 Implement idempotency check: re-running on already-normalized corpus is a no-op (NFR-006) (WP08)
  • ✅ T043 JSON-stable summary report (FR-027); regression test for legacy mission migration + idempotency check (WP08)
Risks
  • Operator runs --yes mode in CI on a corpus that contains a genuinely ambiguous file → CI fails. Acceptable: CI should fail loudly when ambiguity exists, surfacing the file for human repair. Do not auto-bypass ambiguous content in --yes mode.
Parallel opportunities

T041 in parallel with T040 (different flag/CLI surface vs scan logic). T042 follows T040. T043 last.

Prompt: tasks/WP08-charter-encoding-migration.md


Dependency Graph

WP07 (review.py refactor) ──┐
                            ├─► WP03 (mode contract + validator + remediation)
WP01 (hermetic gates) ──────┤
WP02 (pytest-venv lock) ────┘

WP04 (mission-number idempotency)   independent
WP05 (status read worktree)         independent

WP06 (encoding chokepoint) ──► WP08 (charter-content migration)

MVP Scope

WP01 + WP02 + WP07 + WP03 = the release-gate apparatus (the primary release-blocker pattern). WP04 + WP05 = recovery and verification correctness. WP06 + WP08 = the narrowed encoding hygiene slice.

Recommended sprint sequencing:

1. Sprint A (parallel): WP07, WP02, WP04, WP05, WP06. 2. Sprint B (after WP07/WP01/WP02 finish): WP03. 3. Sprint C (after WP06 finishes): WP08. 4. WP01 starts when WP07's package layout exists; can run in parallel with WP02 since it owns different files.

Acceptance for the Mission

  • All 43 subtasks marked complete.
  • All 34 FRs (FR-001 through FR-034) have at least one passing regression test.
  • All 8 NFRs hold.
  • The eat-our-own-dogfood smoke (NFR-003): this mission's own mission-review-report.md is generated by the new WP03 post-merge mode and passes its own contract.
  • Decision verifier reports clean.
  • Mission-review hard gates 1–4 pass on main after the mission is merged.

Next

Run /spec-kitty.implement (or the spec-kitty-implement-review skill) when ready to dispatch lane agents.