- rev 1 (2026-05-12 AM): initial proposal, three open sub-questions.
- rev 2 (2026-05-12 PM): HiC resolved Q1 (hard-block with remediation guidance) and Q3 (keep gate logic near review.py but pre-emptively refactor the file for god-class risk — see new WP07 in mission spec). Q2 remains open pending the drift-impact analysis below.
Date: 2026-05-12
Deciders: Architect Alphonso (proposer), HiC (final decision)
Technical Story:
- Mission
review-merge-gate-hardening-3-2-x-01KRC57CWP03 - Source bug: Priivacy-ai/spec-kitty#985
- Parent epic: #822, #992 (WS-5)
Context and Problem Statement
spec-kitty review (implemented at src/specify_cli/cli/commands/review.py:235) is documented as a four-step post-merge validation (WP lane check, dead-code scan, BLE001 audit, report writer). In practice the command can also be invoked before merge to perform consistency checks against the working tree — and there is no explicit signal in code or surface telling the two intentions apart.
The mission-review doctrine for post-merge runs requires:
issue-matrix.md(hand-authored Markdown table, conventional in 7 prior stable/release missions, no generator)- Gate 1–4 records (currently inline in
review.py:291–446, not surfaced as report fields) mission-exception.mdwhen cross-repo E2E has an environmental xfail- A non-trivial
mission-review-report.mdwhose YAML frontmatter (verdict/reviewed_at/findings) accurately reflects the gates that ran
The bug (#985) is that the existing command can emit verdict: pass / findings: 0 from a post-merge intent invocation while none of the required artifacts exist — and operators take that as authoritative release evidence.
The mission spec FR-005 calls for two explicit modes. The architectural question is: what is the canonical signal that selects the mode, and what is the report contract per mode?
Decision Drivers
- Operator transparency. A reviewer must never mistake a lightweight check for a release gate. The mode must be visible in stdout, JSON, and the persisted report.
- Backward compatibility. Existing operator workflows (
spec-kitty review --mission <slug>without a flag) should not break; behavior may change but not in a way that turns a currently-passing legitimate invocation into a stderr-only spew. - Existing signals.
meta.json.baseline_merge_commitis already populated by post-083 mission merge and is the only explicit post-merge signal in code today (review.py:284–285). Reusing it avoids inventing a new lifecycle concept that #992 WS-1 may want to define centrally. - JSON stability. FR-009 requires diagnostic codes to be JSON-stable for the cross-surface fixture harness (#992 Phase 0). Mode names and failure codes must be machine-parseable strings.
- Lightweight mode must remain useful. Pre-merge consistency checks are valuable during mission iteration; the contract must not push operators away from them.
Considered Options
- (A) Implicit mode via
baseline_merge_commitonly — no CLI flag; command auto-selects. - (B) Explicit
--mode {lightweight|post-merge}flag with auto-detect default — flag overrides; default is auto-detect frombaseline_merge_commit. - (C) Two separate subcommands —
spec-kitty review consistencyvsspec-kitty review mission. - (D) New
mission_statefield inmeta.json— explicit lifecycle field (proposed | in_progress | merged | accepted); command reads it.
Proposed Decision Outcome
Recommended option: (B) Explicit --mode {lightweight|post-merge} flag with auto-detect default, because it (i) satisfies FR-005 with the smallest CLI surface change, (ii) preserves backward compatibility via auto-detect from the already-existing baseline_merge_commit signal, (iii) gives operators an explicit override for the legitimate edge case of running lightweight checks on an already-merged mission (e.g., from a detached verification worktree), and (iv) does not pre-empt the centralized lifecycle authority work in #992 WS-1.
Concrete contract proposal (subject to HiC adjustment):
- Mode resolution order:
--modeCLI flag if present.- Else
lightweightifmeta.json.baseline_merge_commitis absent. - Else
post-merge.
- Lightweight mode contract. Stdout/JSON output explicitly states
mode: lightweight (not a release gate). The persistedmission-review-report.mdfrontmatter addsmode: lightweightand writes "Lightweight consistency check; not a release gate" as the first line of the body. Lightweight mode runs only Step 1 (WP lane consistency) and Step 4 (report writer); skips dead-code scan and BLE001 audit. Verdict can still bepassbut the report's wording makes the limitation unambiguous. - Post-merge mode contract. Stdout/JSON output explicitly states
mode: post-merge (release gate). The report frontmatter addsmode: post-mergeand the following keys become required for verdictpassorpass_with_notes:issue_matrix_present: true(artifactissue-matrix.mdexists in the mission dir, or was auto-generated this run — see point 6 below)gates_recorded: [gate_1, gate_2, gate_3, gate_4](each withcommand,exit_code,result)mission_exception_present: true | not_applicable(whenmission-exception.mdis required and validated) Absence of any required key → exit non-zero with a diagnostic.
- Diagnostic codes (JSON-stable, per FR-009). Proposed namespace
MISSION_REVIEW_*:MISSION_REVIEW_ISSUE_MATRIX_MISSINGMISSION_REVIEW_GATE_RECORD_MISSING(with subcodegate_1…gate_4)MISSION_REVIEW_MISSION_EXCEPTION_INVALIDMISSION_REVIEW_MODE_MISMATCH(e.g.,--mode post-mergeinvoked whenbaseline_merge_commitis absent — explicit operator override of the auto-detect signal; warn-or-block per HiC choice in §"Open sub-questions for HiC" below)
mission-review-report.mdschema becomes explicit. Today the YAML fields (verdict,reviewed_at,findings) are implicit in writer code. WP03 promotes them to a documented schema indocs/(and validates them in tests) so review tooling, dashboards, and the cross-surface harness can rely on a stable contract.issue-matrix.mdhandling. Current pattern is hand-authored. WP03 adds (i) a documented Markdown schema (header columns:Issue | Scope | Verdict | Evidence ref), (ii) a passive validator (parse the table; fail if header drifts), (iii) NO auto-generation in this WP (deferred to a successor mission once tracked-issue ingest is decided). Post-merge mode requires the file to exist; auto-generation is an explicit non-goal here per #822 anti-scope.
Consequences if approved
Positive
- Operator can never mistake a lightweight check for a release gate (mode is in stdout, JSON, and the persisted report).
- The auto-detect default preserves backward compatibility for the canonical happy-path invocation.
issue-matrix.mdbecomes a typed artifact (schema, validator) without inflating WP03 with auto-generation work.- Diagnostic codes plug directly into the #992 Phase 0 cross-surface fixture harness with stable names.
baseline_merge_commitbecomes a real lifecycle signal — usable by #992 WS-1 later.
Negative
- One new CLI flag (
--mode) onspec-kitty review. - Operator may run
--mode post-mergeon a pre-merge mission expecting a quick sanity check and get a hard fail. Mitigation: theMISSION_REVIEW_MODE_MISMATCHdiagnostic is explicit. (HiC: choose warn vs block — see open sub-question.)
Neutral
- The bare invocation
spec-kitty review --mission <slug>continues to work; mode is implicit.
Confirmation
- Regression: synthetic mission with
baseline_merge_commitset, noissue-matrix.md→ barespec-kitty reviewexits non-zero withMISSION_REVIEW_ISSUE_MATRIX_MISSING. - Regression: same mission,
--mode lightweightoverride → exit 0, report hasmode: lightweight, body begins with "Lightweight consistency check; not a release gate". - Eat-our-own-dogfood (NFR-003): this mission's own final review runs in
--mode post-merge, produces anissue-matrix.md, and records Gate 1–4.
Pros and Cons of the Options
(A) Implicit mode via baseline_merge_commit only
No CLI flag; command always auto-selects mode from the meta field.
Pros:
- Zero new CLI surface.
- Smallest diff.
Cons:
- No operator override. Verifying a merged mission from a detached worktree with a quick lightweight check becomes impossible.
- "Magic" behavior: the same command does different things in different contexts with no visible switch.
- Pre-083 missions (no
baseline_merge_commitfield) always run lightweight, even when run post-merge — silently.
(B) Explicit --mode flag with auto-detect default
Recommended above.
Pros: see "Consequences if approved" above.
Cons: see "Consequences if approved" above. The only real cost is one new flag.
(C) Two separate subcommands
spec-kitty review consistency vs spec-kitty review mission.
Pros:
- Maximal clarity. Hard to misuse.
Cons:
- Breaks every existing invocation of
spec-kitty review. Hidden in agents, prompts, mission templates, doctrine, docs. Out of proportion for a stabilization mission. - Violates #822 anti-scope clause on "no new CLI surface beyond what a single bug fix requires."
(D) New mission_state field in meta.json
Add mission_state: proposed | in_progress | merged | accepted to meta.json; command reads it.
Pros:
- Cleanest semantic model.
- Aligns with #992 WS-1 future direction.
Cons:
- Cross-cutting: every status-mutating command (next, agent action, move-task, merge) would need to write the field consistently. That is exactly the WorkPackageLifecycle authority work the #992 epic is meant to do. Doing it inside WP03 pre-empts and inflates a separate workstream.
- Migration: existing missions need backfill.
- This is a successor mission's design, not WP03's.
Resolved sub-questions (HiC, 2026-05-12)
Q1 — Mode mismatch behavior → RESOLVED: hard-block with remediation guidance
HiC accepted the recommendation: when an operator runs --mode post-merge against a mission whose meta.json.baseline_merge_commit is absent, the command blocks with MISSION_REVIEW_MODE_MISMATCH. Implementation detail added by HiC: the diagnostic must contain (i) a clear description of what is going on (i.e., "this mission has not been merged; post-merge review requires a baseline commit"), and (ii) a suggested remediation path the operator can follow.
Diagnostic message contract (encoded as part of WP03 acceptance):
ERROR: MISSION_REVIEW_MODE_MISMATCH
Mission '<slug>' has no baseline_merge_commit recorded in meta.json.
Post-merge mission review requires the mission to be merged first
(the merge step records the baseline commit that the review compares against).
What this means:
--mode post-merge expects to validate a merged mission against its
release-gate doctrine. Running it on a pre-merge mission would either
inspect an arbitrary point in the working tree (false confidence) or
fail the dead-code scan trivially.
Remediation options:
1. If the mission is ready to merge: run 'spec-kitty merge --mission <slug>'
and re-run this command after the merge completes.
2. If you want a pre-merge sanity check: re-run with --mode lightweight.
3. If meta.json baseline_merge_commit is missing because of a pre-083
legacy mission: see docs/migration/mission-id-canonical-identity.md
and run 'spec-kitty migrate backfill-identity' first.
The diagnostic is JSON-stable: machine-parseable code (MISSION_REVIEW_MODE_MISMATCH) plus human-readable body. Stdout writes the code; the body goes to stderr. JSON output includes both as structured fields.
Q3 — Gate source-of-truth → RESOLVED: keep gate logic near review.py, but pre-emptively split the file
HiC accepted the "leave inline for now" recommendation with a structural caveat: src/specify_cli/cli/commands/review.py is at risk of becoming a god-module (multiple concerns, growing LOC) once WP03's contract enforcement code lands on top of the existing gate logic. To prevent that, a hygiene refactor of review.py must happen before WP03's contract code is added.
Important scope distinction: this refactor is a mechanical file/method split for hygiene, not a domain-context extraction. The full domain split (e.g., extracting a ReleaseEvidence aggregate) belongs to #992 WS-5 ReleaseEvidence in a successor mission. WP03's refactor scope is strictly:
- Split
review.pyinto a small package:src/specify_cli/cli/commands/review/__init__.py+ sibling files for each Gate. - Extract generic helpers (e.g., report writer, lane-check, dead-code scanner, BLE001 audit) into named functions in dedicated sibling files (
_report.py,_lane_gate.py,_dead_code.py,_ble001_audit.py). - The public
review_mission()entry stays where it is so all existing imports remain valid. - No new abstractions, no new public types, no domain modeling. Anything more sophisticated than "move methods to sibling files and stop the LOC bleed" is out of scope for WP03's refactor and belongs to the WS-5 successor mission.
This refactor is captured as WP07 in the mission spec (prerequisite to WP03). See mission spec for FRs.
Open sub-questions for HiC
Q2 — issue-matrix.md validator strictness → still open; HiC requested more background
HiC: "I need more background to make an informed decision here. Explain the issue, and what the issue-matrix drift would mean in practice."
What issue-matrix.md is, in concrete terms
issue-matrix.md is the per-mission audit ledger that records, for each GitHub issue in the mission's scope, the resolution verdict and the evidence supporting that verdict. It is hand-authored today (no generator). The conventional shape — established by the mission stability-and-hygiene-hardening-2026-04-01KQ4ARB on 2026-04-26 — is a Markdown table.
Observed drift in three recent missions (real examples on origin/main)
The shape today is NOT identical across missions. Examined three:
Shape A (stable-320-release-blocker-cleanup-01KQW4DF/issue-matrix.md):
| Issue | Scope | Verdict | Evidence ref |
|-------|-------|----------|--------------|
| #952 | … | fixed | WP01; <files>; <tests>; CI <jobs> passed on PR #981 |
Shape B (release-3-2-0a5-tranche-1-01KQ7YXH/issue-matrix.md):
| Issue | FR | WP | Verdict | evidence_ref |
|-------|----|----|---------|--------------|
| #705 | FR-002 | WP01 | `fixed` | … |
Shape C (charter-golden-path-e2e-tranche-1-01KQ806X/issue-matrix.md):
No leading table at all — narrative prose first explaining the verdict legend, then the table appears further down with column names Issue | Verdict | Evidence.
Three drift dimensions are already in the wild:
- Column count and order. Shape A has 4 columns; Shape B has 5 (
FRandWPcolumns added). - Column-name capitalization and spacing.
Evidence refvsevidence_ref— same intent, different identifiers. - Document structure. Shape C puts narrative ahead of the table, which a naive Markdown-table parser would either skip or fail on depending on its rules.
What "drift" means in practice for the new contract
WP03 introduces tooling that depends on issue-matrix.md being machine-readable:
- Cross-surface fixture harness (#992 Phase 0): asserts the matrix exists and can be parsed for a synthesized post-merge fixture.
- Dashboard rendering: if/when the dashboard ever surfaces issue-matrix verdicts per mission, it needs predictable column semantics.
- The
gates_recordedreport field (FR-007): WP03 wants to record the per-issue verdict alongside Gate 1–4 records; the join key is the matrix'sIssuecolumn.
If WP03 ships with a strict validator (hard-fail on header drift):
- Positive: breakage is loud and immediate. The first mission that drifts can't pass post-merge review until the matrix conforms. The schema stabilizes for downstream consumers.
- Negative: legitimate evolution of the matrix (e.g., a future "Severity" column) requires a coordinated schema bump. Past missions whose matrix shape predates the validator would fail when re-reviewed (though re-review is rare).
If WP03 ships with a loose validator (warn-and-parse-what-you-can):
- Positive: tolerates the existing shape variation; doesn't break re-review of past missions.
- Negative: downstream consumers (harness, dashboard, future tooling) get unpredictable data. The drift problem persists because "warn" is widely ignored. We end up with
feedback://vsreview-cycle://all over again (cf. #962 — silent pointer-scheme drift that took a dedicated bug to fix).
Architect Alphonso revised recommendation
Hard-fail (strict) on the header row only, with a single forward-compatibility rule: columns may be added without breaking validation, but the four canonical columns Issue | Scope | Verdict | Evidence ref must be present in that order. Renames, removals, or capitalization changes hard-fail with MISSION_REVIEW_ISSUE_MATRIX_SCHEMA_DRIFT. The body rows are parsed loosely (free-form Markdown in cells) to keep authoring practical.
This (i) stabilizes the canonical contract that downstream consumers depend on, (ii) makes legitimate forward evolution (extra columns) possible without a coordinated schema bump, (iii) catches the high-cost drift modes (rename, reorder, remove) loudly. The existing missions whose shape diverges are not currently re-reviewed by tooling, so the constraint only fires forward from WP03 onward; if they ever are re-reviewed, the operator gets a clear diagnostic naming the missing canonical column.
HiC decision needed: approve "strict on header, additive-tolerant, loose on body" — or override with a different stance (looser, stricter, or different forward-compat rule).
More Information
- Source bug body: #985
- Code reference:
src/specify_cli/cli/commands/review.py:235(review_missionentry point);src/specify_cli/cli/commands/review.py:284–285(baseline_merge_commitcheck);src/specify_cli/cli/commands/review.py:465–470(current YAML frontmatter writer). - Mission spec FR-005 through FR-009 and NFR-001 through NFR-003 in
kitty-specs/review-merge-gate-hardening-3-2-x-01KRC57C/spec.md.