Spec: Planning Pipeline Integrity and Runtime Reliability

Mission: 069-planning-pipeline-integrity Version: 1.0 Status: Draft Target Branch: main


Overview

Spec-kitty's planning and runtime surfaces exhibit four structural fragilities discovered during active development of mission 068. Each makes the tool unreliable for agents running automated workflows, for CI pipelines checking repo cleanliness, and for humans authoring mission plans.

Problem summary:

1. Dirty-git reads (#524) — Read-only CLI commands silently modify the git working tree by unconditionally rewriting a derived status cache file on every invocation. 2. Corrupted lane assignments (#525) — Work package dependency extraction silently produces wrong lane assignments by scanning unbounded prose regions in tasks.md. 3. Ghost completions (#526) — The mission state machine advances on every bare invocation of next, even when no work was completed. 4. Slug validator mismatch (#527) — The specify command rejects all feature slugs that follow spec-kitty's own NNN-* naming convention.

This feature resolves all four.


Actors

ActorDescription
AgentAn automated AI coding assistant executing spec-kitty commands in a worktree or main repository checkout.
Human operatorA developer or release engineer running spec-kitty commands interactively.
CI pipelineAn automated pipeline that asserts git tree cleanliness after spec-kitty command runs.
PlannerAn LLM generating mission planning artifacts (wps.yaml, tasks.md, WP prompt files).

User Scenarios

SC-001: Agent reads task status and worktree stays clean

An agent running inside a git worktree calls the task status command to check which work packages are in which lane. After the call, git status shows no modified or untracked files. The agent can continue working without needing to stash or reset unexpected changes.

Acceptance: git status --porcelain output is empty after running any status, query-mode next, or dashboard command against a previously-clean repository.

SC-002: Planner writes prose-heavy WP prompt files without corrupting dependencies

A planner generates a wps.yaml manifest declaring WP05: dependencies: []. It also writes a rich WP05 prompt file containing a "Dependency Graph" section that cross-references other WPs. When finalize-tasks runs, WP05's dependencies remain empty as declared. The lane planner assigns WP05 to a parallel lane.

Acceptance: After finalize-tasks, WP05 is not assigned any dependency derived from prose content in any WP prompt file or tasks.md.

SC-003: Disoriented agent uses next for orientation

An agent recovering from an error calls spec-kitty next (without --result) to find out where it is in the mission. The command returns the current step name and pending action. The output begins with [QUERY — no result provided, state not advanced]. The state machine does not advance. The agent sees the label, understands it must use --result success when it has actually completed a step.

Acceptance: The mission state machine step counter is identical before and after a bare spec-kitty next call.

SC-004: Human specifies a numbered feature slug

A human operator runs spec-kitty specify 070-new-feature to create a new mission following spec-kitty's own naming convention. The command does not error on the slug. It proceeds to mission creation or discovery input.

Acceptance: No slug validation error is raised for any slug matching the pattern NNN-* where N is a digit.

SC-005: Legacy mission without wps.yaml continues to work

A human operator runs finalize-tasks against an older mission that has only tasks.md and per-WP prompt files (no wps.yaml). The command falls back to the prose parser and completes without error, with dependency behavior unchanged from the previous release.

Acceptance: finalize-tasks completes successfully for missions created before wps.yaml was introduced.

SC-006: CI pipeline clean-tree check passes after spec-kitty commands

A CI pipeline runs a status check command and then asserts no files were modified. The pipeline passes. Previously, this pipeline would fail due to the status cache file being unconditionally rewritten.

Acceptance: git diff --exit-code exits with code 0 after any read-only spec-kitty command on a clean repository.


Functional Requirements

Status Cache Idempotency — Problem 1 (#524)

IDRequirementStatus
FR-001The status cache file is only written when the underlying event log has changed since the last materialization.Proposed
FR-002The materialized_at timestamp in the status cache is derived from the timestamp of the last event in the event log, not from the wall clock at read time.Proposed
FR-003All read-only spec-kitty commands (task status, query-mode next, dashboard rendering) leave zero modified files in the git working tree on a previously-clean repository.Proposed

Structured WP Manifest — Problem 2 (#525)

IDRequirementStatus
FR-004A structured WP manifest file (wps.yaml) is defined at kitty-specs/<slug>/wps.yaml. Each entry contains: id, title, dependencies, owned_files, requirement_refs, subtasks, and prompt_file. Fields priority, execution_mode, and authoritative_surface from the #525 design are deferred to a future feature.Proposed
FR-005wps.yaml has a published JSON Schema at src/specify_cli/schemas/wps.schema.json.Proposed
FR-006When wps.yaml is present, finalize-tasks derives all WP metadata (id, title, dependencies, owned_files) exclusively from wps.yaml. It does not scan tasks.md prose for dependency or file-ownership patterns.Proposed
FR-007A dependencies field that is present in wps.yaml — including an empty list — is never modified by the planning pipeline. A missing dependencies field may be populated during the tasks-packages step.Proposed
FR-008When wps.yaml is present, tasks.md is generated by finalize-tasks as a presentation artifact derived from wps.yaml. The LLM planning prompt does not produce tasks.md; the file is a system output.Proposed
FR-009The /spec-kitty.tasks-outline planning prompt produces wps.yaml only. It does not produce tasks.md; that file is generated by finalize-tasks as a post-step.Proposed
FR-010The /spec-kitty.tasks-packages planning prompt updates wps.yaml with per-WP details (owned_files, requirement_refs, subtasks, prompt_file).Proposed
FR-011When wps.yaml is present, finalize-tasks always regenerates tasks.md from the manifest, overwriting any prior tasks.md content. A manually edited tasks.md is not an authoritative source and may be overwritten.Proposed
FR-012Missions without wps.yaml continue to function using the existing prose parser as a fallback. No behavior change for these missions.Proposed

Query-Mode Safety for next — Problem 3 (#526)

IDRequirementStatus
FR-013spec-kitty next called without a --result argument enters query mode: it returns the current step identifier and pending action description without advancing the state machine.Proposed
FR-014Query mode output is prefixed with [QUERY — no result provided, state not advanced] so agents and humans can distinguish it from an advancement response.Proposed
FR-015spec-kitty next --result success retains its current behavior: advances the state machine to the next step.Proposed
FR-016spec-kitty next --result failed and spec-kitty next --result blocked retain their current advancing behaviors.Proposed

Slug Validator Fix — Problem 4 (#527)

IDRequirementStatus
FR-017The slug validation in the mission creation module accepts slugs that begin with one or more digits followed by a hyphen and additional alphanumeric-and-hyphen segments (e.g., 068-post-merge-reliability, 001-foo). The fix applies to the single validation point in src/specify_cli/core/mission_creation.py.Proposed
FR-018The slug validator continues to reject slugs containing uppercase letters, spaces, or special characters other than hyphens.Proposed
FR-019The slug validator continues to reject empty slugs and slugs consisting solely of hyphens.Proposed

Non-Functional Requirements

IDRequirementThresholdStatus
NFR-001Read-only spec-kitty commands complete without modifying the git working tree.Zero files appear in git status --porcelain on a clean repository after any read-only command.Proposed
NFR-002wps.yaml schema validation produces a clear, actionable error message when a manifest is malformed.Error message names the failing field and expected value within 1 second of invocation.Proposed
NFR-003The wps.yaml presence check adds no measurable overhead for legacy missions that lack the file.Fallback detection completes in under 10ms per mission.Proposed
NFR-004spec-kitty next in query mode returns output within the same latency bounds as the current advancing call.No measurable regression in p95 response time compared to the current release.Proposed

Constraints

IDConstraintStatus
C-001status.events.jsonl remains the sole authority for WP status. status.json is and must remain a derived cache that can be fully regenerated at any time from the event log alone.Required
C-002wps.yaml is a new file format. A JSON Schema must be published and documented before finalize-tasks begins consuming it.Required
C-003Existing missions without wps.yaml must continue to function without any required modification to their artifacts.Required
C-004No new required network calls are introduced by any of the four fixes.Required
C-005spec-kitty next --result success retains its advancing behavior. The change affects only calls that omit --result.Required
C-006WP prompt files (tasks/WP01-*.md) remain unrestricted authoring surfaces. Arbitrary prose, cross-references, and summary sections in prompt files must not influence dependency or lane assignment.Required
C-007tasks.md generated from wps.yaml must preserve all information needed for a human to understand the WP breakdown and sequencing. It may not omit WP titles, dependencies, or subtask counts.Required

Success Criteria

1. Running any read-only spec-kitty command (agent tasks status, next without --result, dashboard) against a clean repository leaves zero modified files in git status. 2. A planner can write arbitrary prose in any WP prompt file — including sections that cross-reference other WPs — without risk of corrupting dependency or lane assignments. 3. Calling spec-kitty next without --result returns the current step with a [QUERY — no result provided, state not advanced] label and does not advance the mission state machine. 4. spec-kitty specify 070-any-slug completes slug validation without error. 5. A mission with dependencies: [] explicitly declared in wps.yaml cannot have those dependencies overwritten or augmented by the planning pipeline. 6. Missions created before wps.yaml was introduced complete finalize-tasks without behavioral regression.


Assumptions

  • The three candidate approaches to fix #524 (deterministic materialized_at, skip-write-on-no-change, exclude from git tracking) are implementation choices. The spec requires only the behavioral outcome: clean reads. The chosen implementation must not break any consumer that reads status.json as a cache.
  • The JSON Schema for wps.yaml is stored at src/specify_cli/schemas/wps.schema.json (as specified in FR-005).
  • The legacy prose parser is retained as a fallback code path until all active missions have migrated; it is not deleted in this feature.
  • Query mode for spec-kitty next is activated by the absence of --result, not by a new --query flag, to minimize changes to existing agent invocation patterns.
  • The slug validator has a single validation call site (src/specify_cli/core/mission_creation.py). The error message examples in that site will be updated to remove the "starts with number" invalid example.

Out of Scope

  • Automatic migration of existing missions from tasks.md-only format to wps.yaml.
  • Sunset timeline or removal of the legacy prose parser.
  • Changes to the status.events.jsonl event schema or the 7-lane state machine.
  • Restrictions on how WP prompt file prose may be authored or structured.
  • Validation of owned_files glob patterns against the actual filesystem (pattern syntax validation only, not file existence checks).
  • Changes to the lane planner algorithm beyond consuming wps.yaml as the authoritative dependency source.
  • Downstream WP metadata consumers (agent context resolve, dashboard scanner, doctor command, kanban renderer) continue to read from WP frontmatter written by finalize-tasks. Direct wps.yaml reads by these consumers are out of scope for this feature. The data flow is: wps.yamlfinalize-tasks → WP frontmatter → all downstream consumers (unchanged).
  • wps.yaml fields priority, execution_mode, and authoritative_surface (deferred from #525 design).