Spec: Workflow Parity Fixes 988/989/991

Purpose

Operators and AI agents use spec-kitty next --json, spec-kitty review --mode lightweight, and spec-kitty merge --dry-run as the canonical readiness signals for whether a mission can move forward, ship, or merge. Today, each of these three surfaces silently disagrees with the underlying real command it is meant to preview:

  • next --json can omit the concrete work package (WP) identity it could legitimately claim, making missions look "blocked" when the explicit implement action would succeed.
  • lightweight review can emit a clean pass while skipping the dead-code scan whenever a modern mission lacks baseline_merge_commit, hiding missing release evidence behind a green verdict.
  • merge --dry-run can preview a normal merge for a mission where the real merge would reject with REJECTED_REVIEW_ARTIFACT_CONFLICT, because dry-run skips the review-artifact consistency gate that real merge runs.

This mission restores parity so each preview surface reflects the real surface it claims to represent. After the fix, an operator can trust that a green dry-run, a green lightweight review, and an actionable next --json mean what they say.

Intent Summary

  • Primary actors: human operators driving the CLI; AI agents driving spec-kitty agent flows.
  • Trigger: operator or agent invokes one of the three readiness surfaces against a mission.
  • Desired outcome: each preview surface returns the same actionable verdict — including the same structured failure codes — that the real surface would return for that mission state.
  • Always-true rule: a dry-run / preview / lightweight surface MUST NOT report a green or actionable result that the corresponding real command would not produce.
  • Canonical domain terms: "work package" (WP), "mission", "lane", "baseline_merge_commit", "review cycle", "review artifact", "consistency gate", REJECTED_REVIEW_ARTIFACT_CONFLICT, MISSION_REVIEW_MODE_MISMATCH.

User Scenarios & Testing

Scenario A — Claimable WP visible in next --json (issue #988)

1. Operator runs spec-kitty next --mission <slug> --json against a mission whose state allows the explicit agent action implement to claim a WP. 2. The JSON payload reports mission_state: implement, planned_wps >= 1, preview_step: implement. 3. The JSON payload includes the concrete wp_id that agent action implement would claim, not null. 4. If selection is intentionally impossible (e.g. dependencies unsatisfied, all WPs already claimed by other agents), the JSON payload includes a structured reason explaining why selection was suppressed.

Scenario B — Lightweight review with missing baseline_merge_commit (issue #989)

1. Operator runs spec-kitty review --mission <slug> --mode lightweight against a modern numbered mission whose meta.json has baseline_merge_commit: null. 2. The command does not emit a clean pass while silently skipping the dead-code scan. 3. Either the command exits non-zero with a structured diagnostic and remediation guidance, or it returns a verdict whose payload makes the missing dead-code scan explicit and non-passing. 4. Genuinely historical / legacy missions whose schema never carried baseline_merge_commit retain their existing behavior, with explicit messaging that names the legacy path.

Scenario C — Dry-run merge surfaces review-artifact conflict (issue #991)

1. Mission fixture: WP01 has lane approved; the latest review-cycle artifact for that WP has verdict: rejected. 2. Operator runs spec-kitty merge --mission <slug> --dry-run --json. 3. The dry-run output reports REJECTED_REVIEW_ARTIFACT_CONFLICT in both human and JSON form, exiting non-zero (matching real merge semantics) and pointing at the offending WP and review cycle. 4. The same gate fires for the human (non-JSON) dry-run output. 5. Real merge consistency tests continue to pass unchanged.

Edge Cases

  • A mission with multiple planned WPs where the first is blocked but a later WP is claimable: next --json should expose whichever WP agent action implement would actually claim, with a deterministic selection rule documented in the implementation.
  • Lightweight review against a mission with baseline_merge_commit populated and a clean dead-code scan: behavior unchanged (still passes).
  • merge --dry-run against a mission with all review cycles approved and no artifact conflicts: still previews a normal merge (no false positives introduced).

Functional Requirements

IDDescriptionStatus
FR-001next --json MUST serialize the concrete claimable wp_id whenever spec-kitty agent action implement against the same mission would auto-claim a WP.Active
FR-002next --json MUST include a structured selection_reason (or equivalent named field) explaining why no wp_id was selected, whenever wp_id is omitted but preview_step == "implement".Active
FR-003next --json claimability discovery MUST share a single implementation path with the explicit agent action implement claim logic, with no divergent forking.Active
FR-004review --mode lightweight MUST NOT emit a passing verdict for a modern numbered mission when baseline_merge_commit is null and the dead-code scan was therefore skipped.Active
FR-005review --mode lightweight MUST emit a structured failure code (e.g. LIGHTWEIGHT_REVIEW_MISSING_BASELINE or named-equivalent) with remediation guidance when it cannot run the dead-code scan on a modern mission.Active
FR-006review --mode lightweight MUST preserve existing behavior for explicitly-legacy missions (those marked as pre-baseline schema), with a clearly-named diagnostic path.Active
FR-007merge --dry-run MUST invoke the same review-artifact consistency gate as real merge, including the path that emits REJECTED_REVIEW_ARTIFACT_CONFLICT.Active
FR-008merge --dry-run --json MUST surface REJECTED_REVIEW_ARTIFACT_CONFLICT in JSON output (under the same key real merge emits) when the gate fires.Active
FR-009merge --dry-run human output MUST surface a clearly-labeled REJECTED_REVIEW_ARTIFACT_CONFLICT blocker when the gate fires.Active
FR-010The three fixes MUST add regression tests that fail before the fix and pass after it, for the exact mission states described in scenarios A, B, and C.Active

Non-Functional Requirements

IDDescriptionThresholdStatus
NFR-001The fixes MUST NOT degrade existing test pass rate on the touched modules.100% of pre-existing tests in tests/specify_cli/cli/commands/test_merge.py, tests/post_merge/test_review_artifact_consistency.py, tests/specify_cli/cli/commands/test_review.py, and tests/specify_cli/cli/commands/review/test_mode_resolution.py continue to pass.Active
NFR-002Added regression tests MUST execute quickly to keep the focused suite snappy.New tests for this mission complete in under 5 seconds aggregate when run with pytest -q.Active
NFR-003The fixes MUST keep the dry-run / lightweight code paths CLI-safe in the absence of network access.All new tests run with no network connectivity and no SPEC_KITTY_ENABLE_SAAS_SYNC requirement.Active
NFR-004Structured diagnostic codes MUST be discoverable.Every new error code is referenced at least once in docstrings or test assertion strings so rg can locate it.Active

Constraints

IDDescriptionStatus
C-001Do not change the wire shape of next --json for non-implement states; only the implement-path payload may gain new fields.Active
C-002Do not change the success path of merge --dry-run; the gate only fires when real merge would reject.Active
C-003Do not modify behavior for Priivacy-ai/spec-kitty#971, #771, or #825 (out of scope per start-here.md).Active
C-004Reuse the existing REJECTED_REVIEW_ARTIFACT_CONFLICT and MISSION_REVIEW_MODE_MISMATCH error codes; do not invent parallel codes for the same conditions.Active
C-005Keep the lightweight-review legacy path opt-in via mission schema signal, not by silent fallback.Active

Domain Language

  • Work package (WP): a unit of mission scope identified by WP##; transitions through the 9-lane state machine documented in CLAUDE.md.
  • Claimable WP: a WP that the agent action implement flow would auto-claim for the calling agent (typically lane planned with satisfied dependencies and no competing in-flight claim).
  • Baseline merge commit: the commit SHA recorded in meta.json#baseline_merge_commit that anchors the dead-code scan diff for a mission.
  • Review cycle: a tasks/<WP>/review-cycle-N.md artifact with a verdict (approved | rejected | pending).
  • Review-artifact consistency gate: the preflight that verifies the latest review cycle's verdict is compatible with the WP's current lane before merge.
  • Lightweight review mode: review verdict produced without re-running the heavy post-merge scans; intended for fast iteration but MUST NOT mask missing release evidence.

Assumptions

  • The existing real-merge consistency gate (tests/post_merge/test_review_artifact_consistency.py) already covers the canonical failure shape, so the dry-run fix is primarily a wiring/factoring change rather than a new gate.
  • Modern missions are detectable from meta.json schema (e.g. presence of mission_id or mission_number keys, or schema version), so the lightweight-review fix can distinguish modern vs. legacy missions deterministically without a heuristic.
  • spec-kitty agent action implement --mission <slug> --agent <name> already encapsulates the canonical claim algorithm; next --json can call into that algorithm in discovery-only mode rather than re-deriving it.

Success Criteria

  • SC-001: For a mission whose state lets agent action implement claim a WP, next --json reports the concrete wp_id in 100% of invocations (covered by regression test).
  • SC-002: For a modern numbered mission with baseline_merge_commit: null, review --mode lightweight returns a non-passing structured verdict in 100% of invocations (covered by regression test).
  • SC-003: For a mission with a rejected latest review-cycle artifact on an approved WP, merge --dry-run exits non-zero and emits REJECTED_REVIEW_ARTIFACT_CONFLICT in 100% of invocations across human and JSON output (covered by regression test).
  • SC-004: All four pre-existing focused test files (test_merge.py, test_review_artifact_consistency.py, test_review.py, test_mode_resolution.py) continue to pass at 100%.

Key Entities

  • MissionMeta (meta.json): identity + schema fields including mission_id, mission_number, baseline_merge_commit, mission_slug.
  • WP (work package): lane state + dependencies; lane history in status.events.jsonl.
  • ReviewCycle (tasks/<WP>/review-cycle-N.md): YAML frontmatter with verdict.
  • NextJsonPayload: the JSON document emitted by spec-kitty next --json.
  • MergeDryRunPayload: the JSON document emitted by spec-kitty merge --dry-run --json.
  • LightweightReviewVerdict: the structured result emitted by spec-kitty review --mode lightweight.

Dependencies

  • Existing real-merge review-artifact consistency gate implementation (referenced by tests/post_merge/test_review_artifact_consistency.py).
  • Existing agent action implement claim algorithm.
  • Existing MISSION_REVIEW_MODE_MISMATCH diagnostic path used by post-merge review.

Out of Scope

  • TeamSpace MVP scope.
  • Issues assigned to stijn-dejongh, in particular #971, #771, #825.
  • Refactoring of the broader next, review, or merge command surfaces beyond what is necessary to fix the three parity bugs.
  • Changes to SaaS sync, hosted auth, or tracker behavior.