Feature Specification: 3.2.0 Workflow Reliability Blockers

Feature Branch: release-320-workflow-reliability-01KQKV85 Created: 2026-05-02 Status: Draft Input: User description: "Create and implement one software-dev mission for the highest-priority 3.2.0 stable-release workflow reliability blockers from mission-state-audit-01KQHRB8, covering issues #945, #949, #950, #951, #952, #953, #904, and verification of #944."

User Scenarios & Testing (mandatory)

User Story 1 - Trust Status Transitions (Priority: P1)

An agent executing a work-package transition needs every successful transition command to leave an observable, durable event so downstream implement, review, merge, and dashboard flows all agree on the work-package state.

Why this priority: If state mutation success can be reported without an event, every later workflow step can make a decision from a false state.

Independent Test: A focused workflow fixture can run a transition command from the repository root and from a worktree/subagent context, then verify that each successful response has a corresponding status event and that missing event persistence returns a non-zero result with a precise diagnostic.

Acceptance Scenarios:

1. Given a planned work package and a valid transition request, When the transition command reports success, Then the expected event is present in the mission status event log and the materialized state reflects the new lane. 2. Given an event write or readback failure, When the transition command completes, Then it fails loudly, names the missing transition evidence, and does not allow callers to treat the mutation as successful. 3. Given an implement or review command is backgrounded, interrupted, or slow, When the workflow resumes or is inspected, Then work packages are not stranded in claimed without an actionable recovery state.


User Story 2 - Review the Correct Work (Priority: P1)

A reviewer needs every generated review prompt and diff command to identify the correct repository, mission, work package, worktree, branch, and base reference, even when multiple missions or repositories are active at the same time.

Why this priority: A reviewer acting on the wrong prompt or reconstructed branch name can approve or reject unrelated work.

Independent Test: Concurrent review prompt fixtures can create review requests for two repos or missions and assert that prompt paths are collision-proof, prompt metadata self-identifies the requested work, and generated diff instructions use canonical mission state.

Acceptance Scenarios:

1. Given two concurrent missions or repos request reviews, When prompts are generated, Then each prompt has a unique per-repo, per-mission, per-work-package, per-invocation location. 2. Given a generated prompt names a different repo, mission, work package, or worktree than the requested review, When the reviewer dispatch step validates it, Then dispatch fails closed before any review begins. 3. Given a mission slug begins with mission-, When review diff instructions are generated, Then they use canonical state references rather than reconstructed slug conventions.


User Story 3 - Enforce Active Work Ownership (Priority: P1)

An implementer working through sequential work packages in a shared lane needs file ownership checks to follow the active work package, not a stale work package that previously occupied the lane.

Why this priority: Stale ownership can either block legitimate work or allow changes outside the current work package's scope.

Independent Test: A shared-lane fixture can move from one work package to another with disjoint owned files and prove the guard uses the currently active work package's ownership set.

Acceptance Scenarios:

1. Given a shared lane has completed one work package and started another, When an ownership guard runs, Then it validates changed files against the active work package's owned_files. 2. Given guard context is stale or ambiguous, When the guard cannot prove the active work package, Then it reports a guard-context problem distinct from a true scope violation.


User Story 4 - Preserve Parseable Successful Commands (Priority: P2)

An agent or script consuming command output needs successful local mutations to remain parseable and non-fatal, even if final SaaS or sync cleanup reports a recoverable problem after the local state is already durable.

Why this priority: Red failure output or corrupted JSON after a successful local mutation can cause automation to retry, misclassify, or abandon a valid workflow step.

Independent Test: A fixture can force a final-sync failure after a successful local mutation and assert that command status, stdout, stderr, and JSON output preserve the local success contract while surfacing explicit non-fatal sync diagnostics.

Acceptance Scenarios:

1. Given a local state mutation succeeds and final sync fails, When the command exits, Then the local success remains machine-readable and the sync issue is marked non-fatal. 2. Given a JSON command surface is requested, When non-fatal sync diagnostics exist, Then stdout remains valid JSON and diagnostics appear only in an explicit field or on stderr according to the command contract. 3. Given repeated sync-lock or interpreter-shutdown messages occur in one invocation, When diagnostics are rendered, Then duplicate messages are collapsed so the operator sees one actionable summary.


User Story 5 - Recover Safely from Merge and Review Inconsistency (Priority: P2)

A release operator needs merge, mission-review, and ship workflows to detect unsafe local branch state and stale rejected review artifacts before claiming a mission is ready for release.

Why this priority: A diverged main or contradictory review artifact can invalidate release signoff even when work-package lanes look approved or done.

Independent Test: A merge/ship preflight fixture can simulate local main divergence and stale rejected review-cycle frontmatter, then assert deterministic remediation before release signoff.

Acceptance Scenarios:

1. Given local main has diverged from origin/main, When merge or ship preflight runs, Then the workflow blocks unsafe continuation and provides a deterministic path to a focused PR branch based on mission-owned changes. 2. Given a work package is approved or done but its latest review artifact still says verdict: rejected, When mission-review or ship signoff runs, Then the workflow warns hard or fails until the contradiction is resolved.

Edge Cases

  • A transition writes frontmatter or materialized state but the event append fails or is unreadable.
  • A command runs from a lane worktree whose relative paths do not resolve to the canonical mission directory.
  • Two review prompts are created in the same second for different repositories or work packages.
  • A mission slug already contains the mission- prefix and would be misparsed by string reconstruction.
  • A shared lane moves from WP01 to WP04 with no overlapping owned files.
  • Hosted sync is enabled but the remote service, lock file, or interpreter shutdown path fails after local persistence.
  • Local main contains unrelated commits, missing origin commits, or both.
  • The latest review-cycle artifact contradicts the canonical work-package lane.

Domain Language (include when terminology precision matters)

  • Canonical terms:
  • Mission: A complete Spec Kitty workflow from specify through ship for this stabilization effort.
  • Work package: A planned, independently reviewable unit of mission work, identified as WP##.
  • Lane: The current workflow state of a work package as represented by canonical status events and materialized views.
  • Status event: The durable event record that proves a work-package state transition occurred.
  • Review prompt: The generated reviewer instruction artifact for a specific repo, mission, work package, worktree, and invocation.
  • Final sync diagnostic: A non-fatal hosted sync or cleanup issue reported after successful local persistence.
  • Avoid / ambiguous synonyms:
  • Do not use "feature" as the canonical identity for this work; use "mission".
  • Do not treat "approved", "done", and "rejected" as interchangeable review states.
  • Do not describe final sync failures as command failures when the local mutation succeeded.

Requirements (mandatory)

Functional Requirements

IDTitleUser StoryPriorityStatus
FR-001Atomic transition evidenceAs an agent executing a work-package transition, I want every successful transition to have a corresponding durable status event so that downstream workflow state is trustworthy.HighOpen
FR-002Loud transition failureAs an agent executing a transition, I want missing event persistence or readback to fail the command with a precise diagnostic so that no caller treats an unproven mutation as successful.HighOpen
FR-003Interrupted action recoveryAs an operator verifying prior fixes, I want backgrounded, interrupted, or slow implement/review actions to avoid stranding work packages in claimed so that workflows remain recoverable.HighOpen
FR-004Isolated review prompt identityAs a reviewer, I want review prompts to be collision-proof and self-identifying across repo, mission, work package, worktree, branch, base ref, and invocation so that I review the intended work.HighOpen
FR-005Review prompt fail-closed validationAs a review dispatcher, I want prompt metadata mismatches to block reviewer dispatch so that wrong-repo or wrong-work-package reviews cannot proceed silently.HighOpen
FR-006Canonical review diff refsAs a reviewer, I want diff commands to use canonical mission and lane references so that slug naming edge cases cannot point review at the wrong comparison.HighOpen
FR-007Active work-package ownershipAs an implementer in a shared lane, I want ownership guards to validate against the active work package's owned files so that stale lane context cannot block or permit the wrong changes.HighOpen
FR-008Guard context diagnosticsAs an operator diagnosing guard output, I want stale or ambiguous guard context to be reported separately from true scope violations so that remediation is clear.MediumOpen
FR-009Non-fatal final sync reportingAs an automation consumer, I want successful local mutations to remain successful when final sync fails non-fatally so that scripts do not retry or abort valid state changes.HighOpen
FR-010Parseable command surfacesAs an automation consumer, I want JSON/stdout command surfaces to remain parseable when diagnostics are present so that command consumers can reliably inspect results.HighOpen
FR-011Sync diagnostic deduplicationAs an operator, I want repeated final-sync cleanup messages deduplicated per invocation so that logs remain actionable.MediumOpen
FR-012Diverged main preflightAs a release operator, I want merge or ship preflight to detect local main divergence from origin/main so that unsafe release work stops before state is reconstructed manually.HighOpen
FR-013Focused PR branch pathAs a release operator, I want a deterministic focused PR branch synthesis path when local main is not shippable so that mission-owned work can still be prepared safely.HighOpen
FR-014Review artifact consistency gateAs a mission reviewer, I want approved or done work packages to be checked against the latest review artifact verdict so that stale rejected verdicts cannot coexist silently with release-ready state.HighOpen

Non-Functional Requirements

IDTitleRequirementCategoryPriorityStatus
NFR-001Regression coverageEach primary blocker issue (#945, #949, #950, #951, #952, #953, #904) and verification issue #944 MUST have at least one focused automated regression test or an explicit documented deferral before the mission can be accepted.TestabilityHighOpen
NFR-002No unintended network dependenceRegression tests MUST avoid real network calls except for an explicitly scoped hosted-sync test path; external hosted behavior MUST be mocked or isolated for deterministic local runs.ReliabilityHighOpen
NFR-003Local command determinismFocused workflow fixtures MUST produce the same pass/fail result across at least three consecutive local runs on an unchanged checkout.ReliabilityHighOpen
NFR-004Parseability validationJSON command-output tests MUST validate stdout with a JSON parser and fail if non-JSON diagnostic text appears on stdout.CompatibilityHighOpen
NFR-005Diagnostic specificityHard-failure diagnostics for transition, prompt, ownership, branch, and review-artifact gates MUST name the mission, work package, and violated invariant when those identities are known.OperabilityMediumOpen
NFR-006Coverage quality barNew code introduced by this mission MUST include pytest coverage for new behavior and maintain the project coverage expectation for touched areas.TestabilityMediumOpen

Constraints

IDTitleConstraintCategoryPriorityStatus
C-001Primary repo boundaryThe primary implementation repo for this mission is spec-kitty; spec-kitty-saas and spec-kitty-tracker are context repos unless a work package explicitly scopes changes there.ScopeHighOpen
C-002SaaS sync flag on this machineCommands that exercise SaaS, tracker, hosted auth, or sync flows during testing on this computer MUST run with SPEC_KITTY_ENABLE_SAAS_SYNC=1.EnvironmentHighOpen
C-003Local planning rootSpecify, plan, and tasks artifacts MUST be created from the spec-kitty repository root checkout, not a worktree.WorkflowHighOpen
C-004Canonical state over reconstructionWorkflow logic MUST prefer canonical mission, lane, and status state over reconstructed slug or path conventions.GovernanceHighOpen
C-005No silent release contradictionsMission-review and ship readiness MUST NOT silently pass when canonical state and latest review artifact verdict conflict.GovernanceHighOpen
C-006Issue linkageThe mission MUST maintain traceability to parent issue #822 and blocker issues #945, #949, #950, #951, #952, #953, #904, plus verification-only issue #944.TraceabilityMediumOpen

Key Entities (include if feature involves data)

  • Mission Identity: The canonical mission id, slug, title, base branch, and merge target for this stabilization sprint.
  • Work Package State: The current and historical status of a work package, including event evidence and materialized lane.
  • Review Prompt Invocation: A unique generated review request with metadata binding it to one repo, mission, work package, worktree, branch, base ref, and invocation.
  • Ownership Context: The active work package id and its owned file set at the moment an implement, review, or commit guard runs.
  • Final Sync Diagnostic: A structured non-fatal report about hosted sync or cleanup issues after local persistence has succeeded.
  • Release Preflight Result: A merge/ship readiness decision that records branch divergence, suggested remediation, and review-artifact consistency.

Assumptions & Open Questions (include when discovery leaves documented defaults or deferred decisions)

Assumptions

  • The mission is a software-dev mission because the requested outcome is focused code and test changes.
  • The landing branch is main, based on spec-kitty agent mission branch-context --json reporting current branch main, planning/base branch main, merge target main, and branch_matches_target: true.
  • The suggested work-package split from start-here.md is the intended planning shape unless /spec-kitty.plan finds a stronger split.
  • Closed issue #944 is verification-only unless regression testing shows the fix no longer holds.
  • Hosted sync behavior should be tested without real network calls except where a work package explicitly scopes a SaaS sync path and uses this machine's required SPEC_KITTY_ENABLE_SAAS_SYNC=1 flag.

Open Questions

  • None. The provided start-here.md request is detailed enough to proceed without deferred clarification markers.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: A fresh workflow smoke covering init -> specify -> plan -> tasks -> implement/review -> merge -> PR completes without manual Python status-event emission.
  • SC-002: 100% of transition commands covered by this mission either write the expected durable event on success or return non-zero with a diagnostic naming the missing invariant.
  • SC-003: Review prompt regression tests prove correct repo, mission, work package, and worktree identity for at least two concurrent prompt-generation scenarios.
  • SC-004: Shared-lane ownership regression tests prove the guard uses the active work package's owned files after moving between at least two disjoint work packages.
  • SC-005: Non-fatal final-sync failure tests prove local success output remains parseable and is not rendered as a red command failure.
  • SC-006: Merge/ship preflight tests prove local main divergence is detected before unsafe release continuation and that a focused PR branch path is presented.
  • SC-007: Mission-review or ship readiness tests prove approved/done work-package state cannot silently coexist with a latest review artifact whose verdict remains rejected.
  • SC-008: Each linked blocker issue has either a focused fix validated by tests or an explicit deferral note before mission acceptance.