Merge Abort, Review, and Status Hardening Sprint

Mission ID: 01KQFF35BPH2H8971KR0TEY8ST Mission Type: software-dev Status: Specifying Created: 2026-04-30

Purpose

Fix two confirmed bugs and add five targeted enhancements to the spec-kitty CLI. All seven were observed or witnessed firsthand during mission auth-tranche-2-5-cli-contract-consumption-01KQEJZK (PR #902).

The bugs leave operators stuck with orphaned lock files and silently inconsistent review artifacts. The enhancements close systematic gaps in review quality, error message usability, linting discipline, status-board visibility, and post-merge lifecycle governance.

Background

After a squash merge crashed mid-run, operators discovered that spec-kitty merge --abort did not clean up the runtime lock file, requiring a manual rm to unblock the next merge attempt. Separately, a force-approval of a work package left its review artifact showing verdict: rejected with no warning — the inconsistency was only caught by a manual audit.

Five additional improvements were identified in the same session: lane guard error messages gave no hint that planning artifacts were correctly present on the planning branch; the WP review checklist had no explicit check for error-path test reachability; BLE001 suppressions in security-critical modules lacked justification comments; stalled reviewer agents were invisible in the status board; and there was no structured CLI gate between merge and release.

Primary Actors

spec-kitty next; the primary person or agent affected by bugs #903 and #904

mission review reports; affected by issues #905, #907, #908, #909

  • Orchestrator agent — drives spec-kitty merge, move-task, and
  • Reviewer agent — runs WP review flows and is affected by issue #906
  • Developer / operator — reads status output, lane guard errors, and

Scope

In scope

  • spec-kitty merge --abort cleanup behaviour (#903)
  • Validation and warning when force-approving a WP with a rejected verdict (#904)
  • Lane guard error message improvement to name the planning branch (#905)
  • WP review DoD template text addition for error-path test reachability (#906)
  • BLE001 suppression audit and justification in auth/ and cli/commands/ (#907)
  • spec-kitty review --mission <slug> command (MVP scope only) (#908)
  • Stalled-reviewer detection in spec-kitty agent tasks status and spec-kitty next (#909)

Out of scope

per-file-ignores + inline justification comments are in scope

.amazonq/prompts/, or any other agent directory

  • Changes to the merge execution logic (only the --abort cleanup path changes)
  • Auto-updating review artifact verdicts on force-approve (Option B from #904 context dump)
  • Custom ruff rules or pre-commit hooks for BLE001 (Option B from #907); only
  • Any SaaS or remote-sync integration for the mission review command
  • Changes to any agent-generated copies under .claude/commands/,

User Scenarios and Testing

Scenario 1 — Clean abort after a crashed merge

An orchestrator agent runs spec-kitty merge for a feature. The process crashes mid-run. The agent runs spec-kitty merge --abort. The abort succeeds with exit code 0, the lock file is gone, the merge-state JSON is gone, and any in-progress git merge is aborted. Running spec-kitty merge --abort a second time also exits 0 with no error.

Scenario 2 — Force-approve blocked by stale rejected verdict

An orchestrator agent runs move-task WP05 --to approved --force. The most recent review-cycle artifact for WP05 contains verdict: rejected. The command prints a warning naming the artifact file and the verdict, then exits non-zero. The WP remains in its current lane. Running the same command with --skip-review-artifact-check bypasses the warning and succeeds.

Scenario 3 — Status board warns on stale approved WP

A WP is in done lane but its most recent review-cycle artifact still shows verdict: rejected. Running spec-kitty agent tasks status displays a warning next to that WP.

Scenario 4 — Lane guard names the planning branch

A reviewer agent is on a lane branch and tries to commit a change under kitty-specs/. The lane guard error message names the planning branch and provides a git show <planning-branch>:kitty-specs/ command to verify the file exists there.

Scenario 5 — WP review rejects untested error path

A reviewer follows the updated WP review checklist. The checklist now includes an explicit step requiring that each error-path test is verified to fail when the implementation fix is deleted. A test that only validates the exception handler's structure is flagged as insufficient.

Scenario 6 — Mission review command runs cleanly

An operator runs spec-kitty review --mission merge-review-status-hardening-sprint-01KQFF35. All WPs are in done. No new public symbols are unreferenced. No unjustified BLE001 suppressions are found. The command writes kitty-specs/<slug>/mission-review-report.md with verdict: pass and exits 0.

Scenario 7 — Stalled reviewer surfaced in status

A WP has been in in_review with no status event for 45 minutes (above the 30-minute default threshold). Running spec-kitty agent tasks status shows ⚠ STALLED — no move-task in 45m next to that WP. Running spec-kitty next --agent claude surfaces it as an actionable item with the two intervention commands.

Functional Requirements

IDDescriptionStatus
FR-001spec-kitty merge --abort removes .kittify/runtime/merge/__global_merge__/lock when it existsAccepted
FR-002spec-kitty merge --abort removes .kittify/merge-state.json when it existsAccepted
FR-003spec-kitty merge --abort aborts any in-progress git merge (equivalent to git merge --abort when git is in a merging state)Accepted
FR-004Every step of spec-kitty merge --abort is idempotent — the command exits 0 whether or not each artifact was presentAccepted
FR-005move-task <WP> --to approved --force checks the most recent review-cycle-N.md for that WP; if verdict: rejected is found, the command prints a named warning and exits non-zeroAccepted
FR-006move-task <WP> --to approved --force --skip-review-artifact-check bypasses the rejected-verdict check and proceedsAccepted
FR-007move-task <WP> --to done --force applies the same rejected-verdict check as FR-005Accepted
FR-008spec-kitty agent tasks status emits a per-WP warning when a WP in approved or done lane has a most-recent review-cycle-N.md with verdict: rejectedAccepted
FR-009The verdict field in review-cycle-N.md is validated against the enumerated set: approved, approved_after_orchestrator_fix, arbiter_override, rejectedAccepted
FR-010The lane guard error message names the planning_base_branch and includes a git show <planning-branch>:kitty-specs/ command exampleAccepted
FR-011The WP review checklist template in the software-dev mission command-templates includes a "deletion test" item under error-path test coverageAccepted
FR-012All # noqa: BLE001 suppressions in src/specify_cli/auth/ and src/specify_cli/cli/commands/ either have an inline justification comment or are removed and replaced with explicit exception propagationAccepted
FR-013spec-kitty review --mission <slug> verifies all WPs for the named mission are in done lane; exits non-zero with a list of non-done WPs if notAccepted
FR-014spec-kitty review --mission <slug> checks for new public functions and classes introduced in the mission diff that have no non-test callers in src/; flags each oneAccepted
FR-015spec-kitty review --mission <slug> checks for # noqa: BLE001 suppressions in src/specify_cli/auth/ and src/specify_cli/cli/commands/ that lack inline justification text; flags each oneAccepted
FR-016spec-kitty review --mission <slug> writes kitty-specs/<slug>/mission-review-report.md containing verdict (pass / pass_with_notes / fail), reviewed_at (ISO timestamp), and findings countAccepted
FR-017The kanban status board displays ⚠ STALLED — no move-task in Xm next to any WP in in_review lane whose most recent status event is older than the stall threshold (default 30 minutes)Accepted
FR-018spec-kitty next --agent <name> surfaces stalled in_review WPs as actionable items with the two intervention commands (force-approve and reject-with-feedback)Accepted

Non-Functional Requirements

IDDescriptionThresholdStatus
NFR-001All new CLI commands (FR-013 through FR-016) have at least one integration test100% of new commands coveredAccepted
NFR-002Line coverage for all new and modified source files≥90%Accepted
NFR-003The stall detection threshold is operator-configurable via .kittify/config.yaml under review.stall_threshold_minutesDefault 30; any positive integer acceptedAccepted
NFR-004spec-kitty merge --abort completes in under 2 seconds regardless of filesystem state≤2 seconds wall timeAccepted
NFR-005spec-kitty review --mission completes in under 10 seconds for a mission with fewer than 20 WPs and a diff of fewer than 5,000 lines≤10 secondsAccepted

Constraints

IDDescriptionStatus
C-001Template changes must be made to src/specify_cli/missions/*/command-templates/ only — generated agent copies under .claude/commands/, .amazonq/prompts/, and other agent directories must not be editedAccepted
C-002Status events in status.events.jsonl are append-only; no existing event lines may be mutated by any code in this missionAccepted
C-003mission_id (ULID) is canonical identity in all new code; mission_number is display-only and must not be used for lookup, locking, or routingAccepted
C-004All modified Python files must pass mypy --strict with zero errorsAccepted
C-005The stall detection for FR-017 and FR-018 is read-only; no new state is written to disk as part of the stall checkAccepted
C-006The --skip-review-artifact-check bypass flag from FR-006 must be documented in the CLI --help outputAccepted

Key Entities

EntityDescription
Lock file.kittify/runtime/merge/__global_merge__/lock — mutex that prevents concurrent merges
Merge-state JSON.kittify/merge-state.json — resumable merge progress record
Review-cycle artifactkitty-specs/<slug>/tasks/<WP-dir>/review-cycle-N.md — frontmatter-bearing review outcome file; verdict is the key field
Status event logkitty-specs/<slug>/status.events.jsonl — append-only record of all WP lane transitions
Mission review reportkitty-specs/<slug>/mission-review-report.md — output of spec-kitty review --mission
Planning base branchThe main-line branch that holds all planning artifacts; named in lane guard errors
Stall thresholdOperator-configurable duration (default 30 min) after which an in_review WP with no recent event is flagged

Success Criteria

  • Operators can recover from a crashed merge by running spec-kitty merge --abort without manual filesystem cleanup, 100% of the time
  • Force-approving a WP with a rejected verdict is blocked by default; zero silent inconsistencies reach the done lane
  • Lane guard errors contain enough context that a reviewer can locate planning artifacts in one additional command without asking for help
  • Every WP review checklist includes an explicit error-path reachability check, reducing the class of "tests pass but implementation is untested" bugs reaching review
  • All # noqa: BLE001 suppressions in auth and CLI command code carry an inline justification, making security audits faster
  • Operators have a single CLI command to confirm a merged mission is clean before cutting a release
  • Stalled review agents are surfaced within the first spec-kitty agent tasks status or spec-kitty next invocation after the stall threshold is crossed

Assumptions

  • The lock file constant __global_merge__ is already defined somewhere in src/specify_cli/merge/ (likely executor.py) and does not need to be changed
  • The baseline_merge_commit field in meta.json is populated for all missions created after the mission identity migration (083); missions without it will produce a warning from the review command rather than a hard failure
  • The review.stall_threshold_minutes config key does not currently exist in .kittify/config.yaml; it will be added as an optional key with a default of 30
  • Option A (warn + block on force-approve) is chosen for FR-005 rather than Option B (auto-update verdict); this preserves human oversight
  • Option A (per-file-ignores + inline comments) is chosen for FR-012 rather than a custom ruff rule; this is sufficient for the documented use case and avoids plugin complexity

Open Questions

None — all decisions resolved during discovery.