Contracts

cli-contracts.md

CLI Contracts: 3.1.1 Post-555 Release Hardening

Mission: 079-post-555-release-hardening Purpose: Per-track CLI behavior contracts. These are the user-facing command surfaces that the implementation MUST satisfy. Each contract maps to one or more FRs in spec.md.

This is a CLI contract document, not a REST/OpenAPI spec, because this mission's external surface is a CLI tool. The contracts below describe inputs, outputs, exit codes, side effects, and absences (what the command MUST NOT do).


Track 1 — spec-kitty init

Contract C1.1: spec-kitty init <name> --ai <agent> --non-interactive against an empty directory

Maps to: FR-001, FR-002, FR-003, FR-004, FR-005, FR-008

Inputs:

  • <name>: string, valid filesystem name
  • --ai <agent>: one of the 12 supported agents
  • --non-interactive: present
  • Working directory: an empty directory the user has cd'd into, OR a parent directory under which <name> will be created

Side effects (REQUIRED):

  • Create <target_dir>/.kittify/config.yaml with the selected agent set.
  • Create <target_dir>/.kittify/metadata.yaml with the current spec-kitty version.
  • Create the per-agent slash-command files under the agent's directory (e.g., .codex/prompts/spec-kitty.*.md for --ai codex).
  • Print next-step guidance to stdout that names spec-kitty next (the loop) and spec-kitty agent action implement / spec-kitty agent action review (the per-decision verbs).

Side effects (FORBIDDEN):

  • MUST NOT create <target_dir>/.git/ under any flag combination. There is no --git opt-in.
  • MUST NOT run git init, git add, or git commit under any flag combination.
  • MUST NOT produce any commit, including any commit titled "Initial commit from Specify template" or any equivalent, under any flag combination.
  • MUST NOT create <target_dir>/.agents/skills/ or its contents.
  • MUST NOT name top-level spec-kitty implement as the canonical implementation entrypoint in any printed text.
  • MUST NOT accept the --no-git flag (the flag is removed in 3.1.1; passing it MUST produce a typer "no such option" error).

Output to stdout:

  • A "Next steps" panel that names spec-kitty next --agent <agent> --mission <slug> as the canonical loop entry, and names spec-kitty agent action implement / spec-kitty agent action review as the per-decision verbs the agent invokes. The panel MUST NOT contain a line that names spec-kitty implement (the top-level CLI command) as a canonical user-facing command.

Exit code: 0 on success.


Contract C1.2: spec-kitty init re-run in an already-initialized directory

Maps to: FR-006

Inputs: Same as C1.1, but the target directory already contains .kittify/config.yaml from a prior init.

Acceptable behaviors (one of):

  • Idempotent: exit 0, produce no changes, print a clear message that the directory is already initialized.
  • Fail-fast: exit non-zero with a clear error message naming the conflict (e.g., Error: <target_dir>/.kittify/config.yaml already exists. Run \spec-kitty upgrade\ to migrate or delete the directory and re-run.).

Forbidden:

  • Silent merge or overwrite of existing state.

Contract C1.3: spec-kitty init invoked inside an existing git repository

Maps to: FR-007

Inputs: Same as C1.1, but the working directory (or a parent of it) is inside an existing git working tree.

Side effects (REQUIRED):

  • Same as C1.1 — file creation only.

Side effects (FORBIDDEN):

  • MUST NOT call git init, git add, git commit, git checkout, or any git-state-mutating command.
  • MUST NOT touch the existing git tree's HEAD, branches, or staged state.
  • MUST NOT modify .gitignore of the parent repo unless the new files would otherwise be untracked and the project explicitly opts into .gitignore management (NOT enabled by default).

Contract C1.4: spec-kitty init --help

Maps to: FR-008

Output to stdout:

  • The help text MUST describe the new model accurately:
  • No git initialization (and no --git opt-in flag)
  • No automatic commit
  • The --no-git flag from pre-3.1.1 versions is removed
  • Next-step guidance names spec-kitty next and spec-kitty agent action implement/review

Exit code: 0.


Track 2 — spec-kitty agent mission finalize-tasks (planning-artifact producer)

Contract C2.1: Lane assignment for a feature with planning-artifact WPs

Maps to: FR-101, FR-102, FR-103, FR-104

Inputs:

  • --mission <slug>: a mission whose tasks.md declares at least one planning-artifact WP and at least one code WP.

Behavior:

  • compute_lanes() MUST return a LanesManifest whose lanes list includes a lane with lane_id == "lane-planning" containing all planning-artifact WPs.
  • The lane assignment for code WPs MUST be unaffected by this change.
  • LanesManifest.planning_artifact_wps MAY remain as a derived view (for backward-compat), but MUST be derivable from the lane assignments.

Side effects (REQUIRED):

  • Write kitty-specs/<mission_slug>/lanes.json containing the new manifest.

Side effects (FORBIDDEN):

  • MUST NOT skip planning-artifact WPs from the lane assignment loop.
  • MUST NOT call git worktree add for the lane-planning lane.

Contract C2.2: spec-kitty implement WP## for a planning-artifact WP

Maps to: FR-105 (consumer uniformity)

Inputs:

  • WP##: a planning-artifact work package id from a finalize-tasks-completed mission.
  • --mission <slug>: the mission slug.

Behavior:

  • The command MUST resolve the WP's lane via the uniform lane lookup (no if execution_mode == "planning_artifact" branch in context/resolver.py).
  • The lane lookup MUST return the lane-planning lane.
  • The resolved workspace path MUST be the main repo checkout (paths.get_main_repo_root()), NOT a .worktrees/... directory.
  • The command MUST exit 0 with a clear "workspace = main repo checkout" indication.

Side effects (FORBIDDEN):

  • MUST NOT create a .worktrees/<mission_slug>-lane-planning directory.
  • MUST NOT branch on a planning_artifact WP-type sentinel at the lane-contract boundary.

Track 3 — spec-kitty agent mission create (identity hardening)

Contract C3.1: spec-kitty agent mission create <slug> --json

Maps to: FR-201, FR-204, FR-205

Inputs:

  • <slug>: a kebab-case unnumbered slug
  • --json: present

Behavior:

  • The command MUST mint a ULID mission_id at creation time.
  • The command MUST persist mission_id in kitty-specs/<numbered_slug>/meta.json.
  • The command MUST NOT use the get_next_feature_number() scan as the source of canonical identity (it MAY continue to be used to compute the display-friendly numeric prefix).
  • The output JSON MUST include mission_id as a top-level field.

Output (stdout, JSON):

{
  "result": "success",
  "mission_id": "<ulid string>",
  "mission_slug": "<###-slug>",
  "mission_number": "<###>",
  "...": "..."
}

Concurrency:

  • Two concurrent invocations of this command from two checkouts of the same repository MUST NOT produce colliding mission_id values. (ULID collision resistance is sufficient — no explicit lock is required for uniqueness, but the meta.json write SHOULD use the existing feature_status_lock_path for write atomicity.)

Track 4 — spec-kitty agent mission finalize-tasks (parser hotfix)

Contract C4.1: Bounded final WP section

Maps to: FR-301, FR-302, FR-303, FR-304

Inputs:

  • A tasks.md file authored by an operator that contains:
  • A final WP section (## WP## or ## Work Package WP##)
  • Trailing prose after the final WP section that mentions other WPs in Depends on WP## form
  • The final WP carries an explicit dependencies: declaration in its frontmatter

Behavior:

  • The dependency parser MUST bound the final WP section so that the trailing prose is NOT included in the final WP's body.
  • The bound MUST trigger at: (a) the next WP header, (b) a top-level ## markdown heading whose text is not a WP id, or (c) EOF.
  • Sub-headings (### ) inside the WP section MUST NOT trigger the bound.
  • The explicit dependencies: declaration MUST be preserved verbatim in the finalized manifest.

Forbidden:

  • MUST NOT inject WPs from trailing prose into the final WP's parsed dependencies.
  • MUST NOT overwrite explicit dependencies: declarations with parser-derived values.

Track 5 — Auth refresh

Contract C5.1: refresh_tokens() lock contract

Maps to: FR-401, FR-402, FR-403, FR-404

Behavior (internal — not a user-facing CLI command, but a contract for the function):

  • auth.refresh_tokens() MUST acquire the cross-process filelock.FileLock at function entry.
  • The lock MUST be held for the FULL transaction: read on-disk credentials → HTTP POST to /token/refresh/ → parse response → persist new credentials (or handle failure).
  • The lock MUST be released in a finally block.
  • On 401 from the refresh endpoint:
  • The function MUST re-read on-disk credentials under the held lock.
  • If the on-disk refresh token differs from the value read at function entry, the 401 is stale: exit cleanly without clearing.
  • If the on-disk refresh token is unchanged, the 401 is terminal: clear credentials (under the held lock) and raise AuthenticationError.
  • Inner load() / save() calls inside the locked transaction reacquire the lock as no-ops (reentrancy).

User-facing observable behavior (under contention):

  • Two concurrent CLI invocations that race a refresh, where one rotates the refresh token successfully and the other races, MUST result in the user remaining logged in. The losing process MUST observe the rotated token on its re-read and exit cleanly.

Track 6 — Top-level spec-kitty implement and canonical-path teach-out

Contract C6.1: spec-kitty implement --help

Maps to: FR-503

Output to stdout:

  • The help text MUST mark spec-kitty implement as internal infrastructure — an implementation detail of spec-kitty agent action implement, not a user-facing canonical path.
  • The help text MUST direct callers to spec-kitty next for the loop and spec-kitty agent action implement for the per-WP verb.

Example acceptable docstring: > Internal — allocate or reuse the lane worktree for a work package. This is an implementation detail of \spec-kitty agent action implement\ and is not a canonical user-facing command. Users should invoke \spec-kitty next --agent <name> --mission <slug>\ to drive a mission; agents should invoke \spec-kitty agent action implement <WP> --agent <name>\ for per-WP work. This command remains as a compatibility surface for direct callers.


Contract C6.2: spec-kitty implement WP## (compatibility execution)

Maps to: FR-505

Behavior:

  • The command MUST still run for direct invokers.
  • The command MUST allocate or reuse the lane worktree exactly as it does today.
  • The command MUST NOT print a deprecation banner on every invocation (banners on every run are forbidden by R-6 in plan.md).

Exit code: 0 on success, non-zero on error (unchanged behavior).


Contract C6.3: spec-kitty init next-steps output (overlap with Track 1)

Maps to: FR-501, FR-502 (init's printed output) and FR-504 (the slash-command guidance shipped by init)

Output to stdout:

  • The next-steps panel MUST NOT name top-level spec-kitty implement as a canonical user-facing path.
  • The next-steps panel MUST direct users at the spec-kitty next loop and the spec-kitty agent action implement / spec-kitty agent action review per-decision verbs (the actual post-#555 canonical user workflow per D-4).

Contract C6.4: README.md canonical workflow text

Maps to: FR-502

Static content rule:

  • README.md lines 8-9 (the canonical workflow line) MUST NOT name \implement\` as a step in a way that implies top-level spec-kitty implement` is the canonical command-line invocation.
  • The mermaid / ASCII diagram (lines 64-80) MUST NOT name top-level spec-kitty implement as a step.

Track 7 — Release-hygiene CLI surface

Contract C7.1: scripts/release/validate_release.py in branch mode

Maps to: FR-601, FR-602, FR-606

Behavior:

  • The script MUST validate that pyproject.toml [project].version equals .kittify/metadata.yaml spec_kitty.version.
  • If the two versions disagree, the script MUST exit non-zero and emit a clear error message that names both files and both values.
  • The script MUST also call changelog_has_entry(changelog, target_version) and fail the cut if the entry is missing.

Exit code: 0 if all gates pass; non-zero (1 by convention) on any failure.

Error message format (example):

ERROR: Version mismatch
  pyproject.toml          line 3:  3.1.1
  .kittify/metadata.yaml  line 6:  3.1.1a3
Both files must report the same version before the release can be cut.

Contract C7.2: spec-kitty agent release prep --channel stable --json

Maps to: FR-605

Behavior:

  • The command MUST produce a structured JSON payload representing the proposed release prep.
  • The payload MUST include a proposed_changelog_block field whose value is a non-empty markdown block whose header references the target version (e.g., starts with ## [3.1.1).
  • The payload MUST NOT mutate any file in the working tree (it is a draft generator, not a mutator).

Exit code: 0 on success.


Contract C7.3: Dogfood command set against /private/tmp/311/spec-kitty

Maps to: FR-603, FR-604

Behavior:

1. spec-kitty --version 2. spec-kitty init demo --ai codex --non-interactive (against a fresh empty directory under /tmp/) 3. spec-kitty agent mission create dogfood --json (against /private/tmp/311/spec-kitty) 4. spec-kitty agent mission finalize-tasks --mission dogfood (against /private/tmp/311/spec-kitty) 5. spec-kitty agent tasks status --mission dogfood (against /private/tmp/311/spec-kitty)

  • A fresh shell, with the CLI installed from the release commit, MUST be able to run each of the following commands and observe exit code 0 with no error rooted in version skew:

Acceptance: All five commands exit 0 and produce no error output naming version mismatch.


Out-of-band contracts

The following contracts are listed for completeness but are NOT enforced as CLI behavior — they are documentation / scope assertions:

  • FR-206 (no historical mission identity backfill): asserted via PR review, not via CLI behavior.
  • FR-305 (no full manifest redesign): asserted via PR review, not via CLI behavior.
  • FR-506 (no #538/#540/#542 stabilization): asserted via PR review, not via CLI behavior.

file-format-contracts.md

File Format Contracts: 3.1.1 Post-555 Release Hardening

Mission: 079-post-555-release-hardening Purpose: Persistent file format expectations the mission must satisfy. Each contract names the file, its current shape, the post-mission shape, and the migration story (if any).

The mission edits a small number of file formats. Most are additive (new fields, no removed fields) so they remain backward-compatible.


F1. meta.json (per-mission identity)

Path: kitty-specs/<mission_slug>/meta.json

Owner: core.mission_creation (write), mission_metadata.resolve_mission_identity (read)

Pre-mission shape (Track 3 baseline)

{
  "mission_number": "079",
  "slug": "079-post-555-release-hardening",
  "mission_slug": "079-post-555-release-hardening",
  "friendly_name": "post 555 release hardening",
  "mission_type": "software-dev",
  "target_branch": "main",
  "created_at": "2026-04-09T06:12:24.358591+00:00"
}

Post-mission shape (Track 3 lands)

{
  "mission_id": "01HXYZ0123456789ABCDEFGHJK",
  "mission_number": "079",
  "slug": "079-post-555-release-hardening",
  "mission_slug": "079-post-555-release-hardening",
  "friendly_name": "post 555 release hardening",
  "mission_type": "software-dev",
  "target_branch": "main",
  "created_at": "2026-04-09T06:12:24.358591+00:00",
  "vcs": "git"
}

Changes:

  • NEW required field for new missions: mission_id — string, ULID format (26-character Crockford base32, lexicographically sortable).
  • NEW recommended field: vcs — string, default "git".
  • All existing fields are preserved.

Validation rules:

  • For missions created after Track 3 lands: mission_id MUST be present, MUST be a non-empty string, and MUST parse as a valid ULID.
  • mission_id MUST be immutable. Any code that overwrites it is a contract violation.
  • For historical missions (created before Track 3 lands): mission_id MAY be absent. Loaders MUST tolerate the absence (per NG-1 / NG-2). New machine-facing flows MUST treat absence as an error.

Migration:

  • Mission 079 itself currently has no mission_id. Track 3's first WP MUST add it to mission 079's own meta.json so the mission dogfoods the new identity model.
  • No bulk migration of historical missions is performed (NG-2).

F2. lanes.json (per-mission lane manifest)

Path: kitty-specs/<mission_slug>/lanes.json

Owner: lanes.persistence.write_lanes_json (write), lanes.persistence.require_lanes_json (read)

Pre-mission shape (Track 2 baseline)

{
  "feature_slug": "<mission_slug>",
  "mission_id": "<may or may not be present>",
  "target_branch": "main",
  "lanes": [
    {
      "lane_id": "lane-a",
      "wp_ids": ["WP01", "WP02"],
      "write_scope": ["src/foo/**", "tests/foo/**"],
      "predicted_surfaces": ["surface-a"],
      "depends_on_lanes": [],
      "parallel_group": 1
    }
  ],
  "planning_artifact_wps": ["WP03"],
  "collapse_report": null,
  "computed_at": "<timestamp>",
  "computed_from": "finalize-tasks"
}

Post-mission shape (Track 2 lands)

{
  "feature_slug": "<mission_slug>",
  "mission_id": "<from meta.json — Track 3>",
  "target_branch": "main",
  "lanes": [
    {
      "lane_id": "lane-a",
      "wp_ids": ["WP01", "WP02"],
      "write_scope": ["src/foo/**", "tests/foo/**"],
      "predicted_surfaces": ["surface-a"],
      "depends_on_lanes": [],
      "parallel_group": 1
    },
    {
      "lane_id": "lane-planning",
      "wp_ids": ["WP03"],
      "write_scope": ["kitty-specs/<mission_slug>/**"],
      "predicted_surfaces": ["planning"],
      "depends_on_lanes": [],
      "parallel_group": 0
    }
  ],
  "planning_artifact_wps": ["WP03"],
  "collapse_report": null,
  "computed_at": "<timestamp>",
  "computed_from": "finalize-tasks"
}

Changes:

  • NEW canonical lane: lane-planning is added to the lanes list when the mission has at least one planning-artifact WP.
  • planning_artifact_wps is preserved as a derived view (for backward compatibility with historical lanes.json consumers). Producers SHOULD treat it as derivable from the lane assignments; consumers SHOULD prefer reading from lanes directly.
  • Existing lane-a/lane-b entries are unchanged.

Validation rules:

  • For any mission whose tasks.md declares at least one planning-artifact WP, lanes.json MUST contain a lane with lane_id == "lane-planning" containing all planning-artifact WP ids.
  • For any mission with no planning-artifact WPs, lanes-planning MUST NOT appear (it is conditional on actual planning-artifact WP presence).
  • Every WP id appearing in any WP frontmatter MUST appear in exactly one lane's wp_ids list (no orphan WPs after Track 2).

Migration:

  • Historical lanes.json files written before Track 2 lands continue to be readable. The reader treats absence of lane-planning as "this manifest predates Track 2; the planning-artifact WPs are listed in planning_artifact_wps field instead". This is the only legacy-tolerance hook for lanes.json, and it exists per NG-1.
  • Re-running finalize-tasks on a historical mission rewrites the manifest with the new shape.

F3. .kittify/metadata.yaml (project metadata)

Path: .kittify/metadata.yaml

Owner: core.project_metadata (write), various readers

Pre-mission shape (Track 7 baseline)

spec_kitty:
  version: 3.1.1a2          # ⚠ STALE
  initialized_at: <iso8601>
  last_upgraded_at: <iso8601>
environment:
  python_version: <string>
  platform: <string>
  platform_version: <string>
migrations:
  applied:
    - <migration_id>
    - ...

Post-mission shape (Track 7 lands)

spec_kitty:
  version: 3.1.1            # synced to pyproject.toml at the release cut
  initialized_at: <iso8601>
  last_upgraded_at: <iso8601>
environment:
  python_version: <string>
  platform: <string>
  platform_version: <string>
migrations:
  applied:
    - <migration_id>
    - ...

Changes:

  • spec_kitty.version is bumped to match pyproject.toml. No schema change.

Validation rules:

  • At the release commit, .kittify/metadata.yaml:spec_kitty.version MUST equal pyproject.toml:[project].version.
  • The pre-release validation step MUST fail the cut if the two values disagree.

Migration:

  • Track 7's first WP performs the explicit bump. No automation; this is an intentional human/agent commit.

F4. pyproject.toml (Python package version)

Path: pyproject.toml

Owner: human release engineer (with mission 079 facilitating the release-cut WP)

Field of interest

[project]
name = "spec-kitty-cli"
version = "3.1.1"   # at the release commit; pre-release alphas use 3.1.1a3, 3.1.1a4, ...

Changes:

  • The version field is bumped to 3.1.1 (stripping the alpha suffix) at the release cut WP.

Validation rules:

  • The pre-release validation step (extended scripts/release/validate_release.py) MUST assert this field equals .kittify/metadata.yaml:spec_kitty.version.
  • The validation step MUST also assert that CHANGELOG.md contains an entry whose header matches ## [<version>] (FR-606).

F5. CHANGELOG.md (release narrative)

Path: CHANGELOG.md

Owner: human release engineer (mission 079 may produce structured draft inputs but does NOT author final prose, per C-012 / FR-605)

Format

Keep a Changelog (https://keepachangelog.com/) + Semantic Versioning. Each entry has the form:

## [VERSION] - DATE

### Added
- ...

### Changed
- ...

### Fixed
- ...

Pre-mission state

The file currently has entries for 3.1.1a3, 3.1.1a2, and 3.1.1a1. No 3.1.1 (stable) entry.

Post-mission state

The human release engineer adds a ## [3.1.1] - <date> entry summarizing the seven tracks of mission 079. Mission 079 itself produces a structured draft input via spec-kitty agent release prep --channel stable --json (FR-605); the human takes that draft and produces the final prose.

Validation rules:

  • The pre-release validation step (FR-606) MUST assert that an entry whose header matches ## [3.1.1 exists in CHANGELOG.md and has a non-empty body.
  • The validation step does NOT validate narrative quality or wording — only presence and basic structural shape.

Mission 079 contract:

  • Mission 079 MUST NOT author the final ## [3.1.1] entry prose.
  • Mission 079 MUST produce a structured draft via the existing build_release_prep_payload() helper.
  • Mission 079 MUST add the validation step that fails the cut if the entry is missing.

F6. ~/.spec-kitty/credentials (auth state)

Path: ~/.spec-kitty/credentials (TOML format), with sibling lock file ~/.spec-kitty/credentials.lock.

Owner: sync.auth.CredentialStore

Format (no schema change in this mission)

[tokens]
access = "..."
refresh = "..."
access_expires_at = "2099-01-01T00:00:00+00:00"
refresh_expires_at = "2099-01-01T00:00:00+00:00"

[user]
username = "..."
team_slug = "..."  # optional

[server]
url = "https://..."

Changes:

  • No schema change. Track 5 changes the lock-scope contract around refresh_tokens(), not the file format.

Validation rules (post-Track 5):

  • The lock file MUST be acquired across the FULL refresh transaction (read → network → persist), not only per-I/O.
  • On 401, the refresh function MUST re-read the credentials file under the same lock and compare to the entry-time refresh token before treating the 401 as authoritative grounds for clearing.
  • See contracts/cli-contracts.md Contract C5.1 for the function-level contract.

F7. tasks.md (per-mission work-package narrative)

Path: kitty-specs/<mission_slug>/tasks.md

Owner: human / agent author (write), core.dependency_parser (read)

Format (no schema change in this mission)

tasks.md continues to use the same narrative structure: a ## Plan overview followed by per-WP sections (## WP01, ## WP02, ..., ## WPnn), each with frontmatter (dependencies:, owned_files:, etc.) and prose.

Changes:

  • No format change. Track 4 changes the parser bound behavior on the existing format, not the format itself.

Validation rules (post-Track 4):

  • The dependency parser MUST bound the final WP section at: (a) the next WP header, (b) a top-level ## heading whose text is not a WP id, or (c) EOF.
  • Trailing prose past the final WP section MUST NOT be parsed for dependencies of the final WP.

Authoring guidance (post-Track 4):

  • Authors MAY add trailing sections (e.g., ## Notes, ## Appendix, ## References) after the final WP without risking false-positive dependency inference.
  • Sub-headings (### ) inside a WP section continue to be preserved.

F8. kitty-specs/<mission_slug>/status.events.jsonl (status event log)

Path: kitty-specs/<mission_slug>/status.events.jsonl

Owner: status.store (write), status.reducer (read)

Format

Each line is a JSON object with sorted keys per the existing 3.0 model. See the project CLAUDE.md "Status Model Patterns" section for the full schema.

Changes:

  • No schema change in this mission. The status model is already canonical post-3.0 (feature 060).
  • Track 3 may add mission_id to event payloads emitted by emit_mission_created() (in sync/events.py), but the status event log itself stores per-WP events that already use feature_slug and wp_id. No change needed here.

Validation rules:

  • (unchanged)

F9. Slash-command source templates

Path: src/specify_cli/missions/software-dev/command-templates/{specify,plan,tasks,tasks-packages,implement}.md

Owner: mission template authors (mission 079 edits these as part of Track 6)

Format

Markdown files with <!-- spec-kitty-command-version: ... --> headers and templated body content. The CLI copies these into per-agent directories during init.

Changes:

  • Track 6 edits the body content of these files to:
  • Remove top-level spec-kitty implement as the canonical CLI invocation in user-facing examples.
  • Replace inline CLI invocations with the slash-command equivalent.
  • The <!-- spec-kitty-command-version: ... --> headers are preserved.

Validation rules:

  • Per the project CLAUDE.md: edit SOURCE templates only, NOT generated agent copies under .claude/, .codex/, etc.
  • The migration mechanism that deploys updated templates to existing projects (upgrade/migrations/) MUST pick up the changes. Track 6's tests verify this end-to-end.

Cross-file invariants

InvariantFiles involvedEnforced by
pyproject.toml version == .kittify/metadata.yaml versionpyproject.toml, .kittify/metadata.yamlscripts/release/validate_release.py (Track 7 extension)
CHANGELOG.md has entry for pyproject.toml versionpyproject.toml, CHANGELOG.mdscripts/release/validate_release.py (existing changelog_has_entry, ensured to run in branch mode)
Every WP in tasks.md has a lane assignment in lanes.jsontasks.md, lanes.jsoncompute_lanes() (Track 2)
meta.json:mission_id exists for every mission created after Track 3 landsmeta.jsoncore.mission_creation (Track 3) + acceptance test

test-contracts.md

Test Contracts: 3.1.1 Post-555 Release Hardening

Mission: 079-post-555-release-hardening Purpose: Regression test scenarios that the implementation MUST satisfy. Each test contract maps to one or more FRs and is the acceptance gate for its track.

This document is the contract layer. The actual test code lives in tests/ (per the file paths in plan.md §4 and §8). Reviewers and the /spec-kitty.review command use this document as the canonical "did the implementation satisfy the spec?" gate.


Conventions

  • Test IDs are T<track>.<n> where <track> is 1-7 and <n> is the test ordinal within that track.
  • All tests MUST run under PWHEADLESS=1 pytest tests/ and complete within the NFR-004 budget (< 60 s aggregate for new tests added by this mission).
  • All tests MUST be deterministic. Concurrent / race tests use explicit synchronization primitives; no time.sleep based races.
  • All file system tests MUST use tmp_path fixtures from pytest; they MUST NOT touch the working repo or the user's home directory.

Track 1 — init coherence

T1.1 — init does not create .git/

File: tests/init/test_init_minimal_integration.py (extension) Maps to: FR-001 Setup: An empty tmp_path. Action: Run spec-kitty init demo --ai codex --non-interactive against tmp_path. Assertions:

  • (tmp_path / "demo" / ".git").exists() == False
  • (tmp_path / ".git").exists() == False (in case the target was tmp_path itself)

T1.2 — init does not produce a commit

File: tests/init/test_init_minimal_integration.py (extension) Maps to: FR-002 Setup: An empty tmp_path. Action: Run spec-kitty init demo --ai codex --non-interactive. Assertions:

  • subprocess.run(["git", "log"], cwd=tmp_path/"demo", capture_output=True).returncode != 0 (no git repo) OR
  • The literal string "Initial commit from Specify template" does NOT appear anywhere under tmp_path/"demo" (grep across the directory).

T1.3 — init does not create .agents/skills/

File: tests/init/test_init_minimal_integration.py (extension) Maps to: FR-003 Action: Run spec-kitty init demo --ai codex --non-interactive. Assertions:

  • (tmp_path / "demo" / ".agents" / "skills").exists() == False

T1.4 — init next-steps does not name top-level spec-kitty implement

File: tests/init/test_init_next_steps.py (NEW) Maps to: FR-004, FR-501 Action: Capture stdout from spec-kitty init demo --ai codex --non-interactive. Assertions:

  • The captured stdout does NOT contain a line that names spec-kitty implement (the top-level CLI invocation) as a canonical implementation step.
  • The captured stdout DOES name spec-kitty next as the canonical loop entry and spec-kitty agent action implement / spec-kitty agent action review as the per-decision verbs.
  • Slash-command file names like /spec-kitty.implement MAY appear in the output (they refer to slash commands surfaced in the agent runtime, not top-level CLI invocations). What is FORBIDDEN is the literal string spec-kitty implement WP or any prose teaching top-level spec-kitty implement as a canonical user-facing CLI invocation.
  • The captured stdout DOES NOT contain Initial commit from Specify template.

T1.5 — init is idempotent on re-run

File: tests/init/test_init_idempotent.py (NEW) Maps to: FR-006 Setup: A tmp_path that has already been initialized once. Action: Run spec-kitty init demo --ai codex --non-interactive a second time. Assertions: One of:

  • Exit code 0, file set unchanged, message "already initialized" or equivalent. OR
  • Exit code != 0, error message names the conflict.

T1.6 — init does not touch existing git state

File: tests/init/test_init_in_existing_repo.py (NEW) Maps to: FR-007 Setup: A tmp_path initialized as a git repo with one user-authored commit. Action: Run spec-kitty init demo --ai codex --non-interactive against the same path. Assertions:

  • The git HEAD commit hash is unchanged before vs. after.
  • The git branch is unchanged.
  • git status --porcelain shows only the new files created by init (no modified existing files).
  • No new commits exist in the git log.

T1.7 — init --help describes the new model

File: tests/init/test_init_help.py (NEW or extend) Maps to: FR-008 Action: Capture stdout from spec-kitty init --help. Assertions:

  • Help text mentions "no automatic git initialization" (or equivalent phrasing).
  • Help text mentions "no automatic commit" (or equivalent).
  • Help text names spec-kitty next and spec-kitty agent action implement/review as the canonical user-facing path.
  • Help text confirms the --no-git flag is no longer present (P1: removed in 3.1.1).

Track 2 — Planning-artifact producer correctness

T2.1 — compute_lanes includes planning-artifact WPs

File: tests/lanes/test_compute_planning_artifact.py (NEW) Maps to: FR-101 Setup: A test fixture mission with one code WP and one planning-artifact WP. Build the dependency graph and ownership manifests. Action: Call compute_lanes(dependency_graph, ownership_manifests, mission_slug, target_branch, ...). Assertions:

  • The returned LanesManifest.lanes list contains a lane whose lane_id == "lane-planning".
  • That lane's wp_ids includes the planning-artifact WP.

T2.2 — Planning lane has canonical id

File: tests/lanes/test_compute_planning_artifact.py (NEW) Maps to: FR-102 Action: Same as T2.1. Assertions:

  • The lane id is exactly "lane-planning" (canonical, not derived from the mission slug).

T2.3 — lane_branch_name for lane-planning returns the planning branch

File: tests/lanes/test_branch_naming_planning.py (NEW) Maps to: FR-103 Action: Call lane_branch_name("079-post-555-release-hardening", "lane-planning"). Assertions:

  • The returned string equals the mission's target_branch value (e.g., "main"), NOT "kitty/mission-079-post-555-release-hardening-lane-planning".

T2.4 — Resolver returns coherent ref for planning-artifact WP via lane lookup

File: tests/context/test_resolver_planning_artifact.py (NEW) Maps to: FR-104, FR-105 Setup: A mission with at least one planning-artifact WP, lanes computed. Action: Call context.resolver.resolve_authoritative_ref(feature_dir, mission_slug, planning_wp_id). Assertions:

  • The returned ref is the planning branch (e.g., "main").
  • The function does NOT raise MissingIdentityError.
  • The function does NOT branch on execution_mode == "planning_artifact" at the lane lookup site (verifiable via code review or by injecting a mock that asserts the call path).

T2.5 — implement dispatch resolves planning-artifact WP to main repo checkout

File: tests/agent/cli/commands/test_implement_planning_artifact.py (NEW) Maps to: FR-105 Setup: A mission with at least one planning-artifact WP, lanes computed. Action: Invoke spec-kitty implement <planning_wp_id> --mission <slug>. Assertions:

  • The resolved workspace path is paths.get_main_repo_root(working_dir), NOT .worktrees/<slug>-lane-planning.
  • No .worktrees/<slug>-lane-planning directory was created.
  • Exit code 0.

T2.6 — Code WPs continue to receive normal lane assignments

File: tests/lanes/test_compute_planning_artifact.py (NEW) Maps to: FR-106 (regression coverage for the unchanged path) Setup: Same as T2.1. Action: Same as T2.1. Assertions:

  • The code WP receives a lane-a/lane-b style lane id (NOT lane-planning).
  • The code WP's lane has a non-empty write_scope.

Track 3 — Mission identity Phase 1

T3.1 — mission_id is minted at creation

File: tests/core/test_mission_creation_identity.py (NEW) Maps to: FR-201 Action: Call core.mission_creation.create_mission_core(repo_root, "test-identity") against a tmp_path repo. Assertions:

  • Read <feature_dir>/meta.json.
  • meta["mission_id"] exists, is a string, is non-empty.
  • The string parses as a valid ULID via ulid.ULID.from_str(meta["mission_id"]).

T3.2 — mission_id does not depend on numeric prefix scan

File: tests/core/test_mission_creation_identity.py (NEW) Maps to: FR-204 Setup: A tmp_path repo with two existing missions at kitty-specs/001-foo/ and kitty-specs/099-bar/. Action: Call core.mission_creation.create_mission_core(tmp_path, "new-mission"). Assertions:

  • The new mission's mission_id is a valid ULID.
  • The mission_id value does NOT depend on the numeric prefix value (i.e., creating the same mission name in a different repo with different prefixes yields a different mission_id).
  • The numeric prefix is "100" (display-friendly, max+1) — this confirms get_next_feature_number() still works for display, but the mission_id is independently generated.

T3.3 — Concurrent mission creation does not collide

File: tests/core/test_mission_creation_concurrent.py (NEW) Maps to: FR-205 Setup: Two independent tmp_path repos OR two threads operating on the same repo with different slugs. Action: Spawn two threads that each call create_mission_core with distinct slugs. Assertions:

  • Both meta.json files exist.
  • Both have mission_id values.
  • The two mission_id values are different.
  • Both threads exit cleanly (no exceptions, no truncated writes).

T3.4 — MissionIdentity exposes mission_id

File: tests/mission_metadata/test_mission_identity_includes_id.py (NEW or extend existing) Maps to: FR-202 Setup: A mission meta.json with mission_id. Action: Call mission_metadata.resolve_mission_identity(feature_dir). Assertions:

  • The returned MissionIdentity has a mission_id attribute.
  • The attribute equals the value from meta.json.

T3.5 — MissionIdentity tolerates legacy missions without mission_id

File: tests/mission_metadata/test_mission_identity_legacy.py (NEW or extend) Maps to: NG-2 (legacy tolerance for display only) Setup: A mission meta.json WITHOUT mission_id (simulating a historical mission). Action: Call mission_metadata.resolve_mission_identity(feature_dir). Assertions:

  • The returned MissionIdentity has mission_id is None.
  • Other fields (mission_slug, mission_number, mission_type) are populated normally.

T3.6 — emit_mission_created payload includes mission_id

File: tests/sync/test_emit_mission_created_includes_mission_id.py (NEW) Maps to: FR-202 (machine-facing flows identify by mission_id) Action: Call sync.events.emit_mission_created(mission_slug=..., mission_number=..., mission_id=..., ...). Assertions:

  • The emitted payload (JSON) contains mission_id as a top-level field.
  • The value matches the input.

Track 4 — Tasks/finalize hotfix

T4.1 — Final WP section bounds at top-level non-WP heading

File: tests/core/test_dependency_parser.py (extension) Maps to: FR-301 Setup:

## WP01

**Dependencies**: WP00

Body of WP01.

## WP02

**Dependencies**: []

Body of WP02.

## Notes

This phase depends on WP01 being complete and signed off. Depends on WP01.

Action: Call parse_dependencies_from_tasks_md(content). Assertions:

  • result == {"WP01": ["WP00"], "WP02": []}
  • Specifically: "WP02": [] — the trailing prose under ## Notes is NOT parsed into WP02's dependencies.

T4.2 — Trailing prose without a ## heading still bounds at EOF

File: tests/core/test_dependency_parser.py (extension) Maps to: FR-301 (edge case) Setup:

## WP01

**Dependencies**: []

Body of WP01.

This is an unstructured trailing paragraph that mentions WP02 informally.

Action: Call parse_dependencies_from_tasks_md(content). Expected behavior: The current behavior is that trailing prose after the final WP without a ## heading is parsed (this is the underlying #525 issue). For Track 4's narrow slice, the chosen design is to bound at ## headings only — trailing free-form prose remains a known issue documented in research.md rejected alternatives. This test is a NEGATIVE assertion: it documents the known limitation and asserts the parser correctly identifies WP02 in this edge case OR returns [] if the parser otherwise improves. The test name should make this clear. Assertion: The test passes if result["WP01"] is either [] or ["WP02"]. The point is to have the test exist as a tripwire — if a future change tightens the bound, this test catches the regression in the other direction.

(Reviewer note: T4.2 is a deliberate documentation test for the known #525 boundary; the strict bound is FR-301, which T4.1 covers. Discuss with the reviewer if T4.2 should be removed for clarity — keeping it preserves the test as a sentinel for the next iteration.)

T4.3 — Sub-headings inside a WP section do NOT trigger the bound

File: tests/core/test_dependency_parser.py (extension) Maps to: FR-301 (edge case) Setup:

## WP01

**Dependencies**: WP00

### Implementation notes

Some implementation notes here.

### Test plan

- Depends on WP02

## WP02

Body.

Action: Call parse_dependencies_from_tasks_md(content). Assertions:

  • result["WP01"] includes both WP00 AND WP02 (the bullet-list under ### Test plan is inside WP01's section because ### does not trigger the bound).
  • This validates that sub-headings within a WP section are preserved.

T4.4 — Explicit dependencies: declaration is preserved verbatim

File: tests/core/test_dependency_parser.py (extension) Maps to: FR-302, FR-303 Setup: A tasks.md where WP02 has explicit dependencies: [] in frontmatter and no inline Depends on text. Action: Run finalize-tasks against the file. Assertions:

  • The finalized manifest's WP02 has dependencies: [].
  • The disagree-loud check did not fire (no error).

Track 5 — Auth refresh race fix

T5.1 — refresh_tokens holds the lock for the full transaction

File: tests/sync/test_auth_concurrent_refresh.py (NEW) Maps to: FR-401 Action: Patch the HTTP client to introduce an artificial delay; in a second thread, attempt to acquire the same FileLock and assert it blocks until the refresh completes. Assertions:

  • The second thread's lock acquisition blocks for at least the duration of the network delay.
  • After the refresh completes, the second thread acquires the lock and proceeds.

T5.2 — Stale 401 does not clear credentials

File: tests/sync/test_auth_concurrent_refresh.py (NEW) Maps to: FR-402, FR-403 Setup: Two threads. Thread A starts a refresh (mocked HTTP returns 200 with new tokens). Thread B starts a refresh (mocked HTTP returns 401) where, between B's lock release at function entry and B's network call return, A has already rotated the token on disk. Action: Both threads run. Assertions:

  • Thread A: completes successfully, on-disk token is the new rotated value.
  • Thread B: detects the on-disk token has changed since function entry, exits cleanly without clearing.
  • After both threads complete, the credentials file still exists and contains Thread A's rotated tokens.

T5.3 — Real (non-stale) 401 still clears credentials

File: tests/sync/test_auth_concurrent_refresh.py (NEW) Maps to: FR-403 Setup: A single thread. Mocked HTTP returns 401. No concurrent rotation occurs. Action: Call refresh_tokens(). Assertions:

  • The function reads on-disk credentials, finds them unchanged from function entry, treats the 401 as authoritative, clears credentials.
  • Raises AuthenticationError.
  • The credentials file no longer exists.

T5.4 — Reentrancy: inner load/save calls are no-op lock acquisitions

File: tests/sync/test_auth_concurrent_refresh.py (NEW) Maps to: FR-401 Action: Mock the lock to count acquisitions; call refresh_tokens(). Assertions:

  • The lock is acquired exactly once at function entry by the same thread.
  • Inner load() / save() calls do NOT cause additional cross-process lock acquisitions (they reacquire the in-memory thread-local lock state without blocking).

Track 6 — Top-level implement de-emphasis

T6.1 — spec-kitty implement --help marks compatibility surface

File: tests/agent/cli/commands/test_implement_help.py (NEW) Maps to: FR-503 Action: Capture stdout from spec-kitty implement --help. Assertions:

  • The text contains the phrase "internal infrastructure" or "implementation detail" (marking the command as not part of the canonical user-facing path).
  • The text contains a literal reference to spec-kitty next (the canonical loop) AND a literal reference to spec-kitty agent action implement (the canonical per-WP verb).
  • The text MAY also describe the command as a "compatibility surface" for direct invokers, but the primary framing MUST be "internal infrastructure".

T6.2 — spec-kitty implement still runs

File: tests/agent/cli/commands/test_implement_runs.py (extend existing or NEW) Maps to: FR-505 Setup: A mission with finalized lanes containing at least one code WP. Action: Invoke spec-kitty implement WP01 --mission <slug>. Assertions:

  • Exit code 0.
  • The lane worktree was allocated or reused.
  • The command produced its expected output (JSON or human, per mode).

T6.3 — init next-steps does not name top-level spec-kitty implement

(Same as T1.4 — overlap with Track 1.)

T6.4 — README.md does not name top-level spec-kitty implement in canonical workflow

File: tests/docs/test_readme_canonical_path.py (NEW) Maps to: FR-502 Setup: Read README.md from the working repo. Assertions:

  • The string \implement\`` does NOT appear in the canonical workflow line at lines ~8-9 (or wherever the canonical workflow line ends up after the rewrite).
  • The mermaid / ASCII diagram (lines ~64-80) does NOT name spec-kitty implement as a step.
  • A more semantic check: the README's "getting started" section refers users to spec-kitty next (the loop) and spec-kitty agent action implement / spec-kitty agent action review (the per-decision verbs), not to top-level spec-kitty implement.

T6.5 — Slash-command source templates do not teach top-level spec-kitty implement

File: tests/missions/test_command_templates_canonical_path.py (NEW) Maps to: FR-504 Setup: Read each file under src/specify_cli/missions/software-dev/command-templates/. Assertions:

  • The literal string spec-kitty implement WP (top-level CLI invocation form) does NOT appear in tasks.md, tasks-packages.md, specify.md, plan.md, or implement.md as a canonical user-facing example.
  • Where the templates need to reference the implement step, they use spec-kitty agent action implement <WP> --agent <name> (the agent-facing wrapper that handles workspace creation internally) and/or spec-kitty next --agent <name> --mission <slug> (the loop entry).
  • The slash-command file /spec-kitty.implement MAY remain as a slash command, but its body MUST resolve to spec-kitty agent action implement invocation, not to the top-level spec-kitty implement invocation.

Track 7 — Repo dogfood / version coherence

T7.1 — validate_release.py fails on metadata-yaml ↔ pyproject mismatch

File: tests/release/test_validate_metadata_yaml_sync.py (NEW) Maps to: FR-601, FR-602 Setup: A tmp_path repo with pyproject.toml (version = "3.1.1") and .kittify/metadata.yaml (spec_kitty.version: 3.1.1a3). Action: Run python scripts/release/validate_release.py against the temp repo. Assertions:

  • Exit code != 0.
  • stderr or stdout contains both file paths and both version values in the error message.

T7.2 — validate_release.py passes when versions match

File: tests/release/test_validate_metadata_yaml_sync.py (NEW) Maps to: FR-601 Setup: A tmp_path repo with both files reporting 3.1.1, plus a CHANGELOG.md that has a ## [3.1.1] entry. Action: Run python scripts/release/validate_release.py. Assertions:

  • Exit code 0.

T7.3 — validate_release.py fails when CHANGELOG entry is missing

File: tests/release/test_validate_changelog_entry.py (NEW or extend existing) Maps to: FR-606 Setup: A tmp_path repo with matching versions but a CHANGELOG.md that has NO ## [3.1.1] entry. Action: Run python scripts/release/validate_release.py in branch mode. Assertions:

  • Exit code != 0.
  • The error message names the missing entry.

T7.4 — build_release_prep_payload produces a valid draft

File: tests/release/test_release_payload_draft.py (NEW) Maps to: FR-605 Setup: A tmp_path repo with at least one accepted WP under kitty-specs/<some-mission>/tasks/done/. Action: Call build_release_prep_payload(channel="stable", repo_root=tmp_path). Assertions:

  • The returned payload is a dict.
  • The payload has a proposed_changelog_block key.
  • The value of proposed_changelog_block is a non-empty string.
  • The string contains a header that references a stable version (e.g., starts with ## [3.1.1).

T7.5 — Dogfood command set runs cleanly

File: tests/release/test_dogfood_command_set.py (NEW) Maps to: FR-603, FR-604 Setup: Use the working repo path (/private/tmp/311/spec-kitty) at the release commit. The test is gated on os.environ.get("SPEC_KITTY_DOGFOOD_TEST") == "1" so it runs only in CI / explicit dogfood mode (it touches the real filesystem). Action: Invoke each command from the dogfood set: 1. spec-kitty --version 2. spec-kitty init demo --ai codex --non-interactive (in a tmp_path) 3. spec-kitty agent mission create dogfood-test --json (in /private/tmp/311/spec-kitty) 4. spec-kitty agent mission finalize-tasks --mission dogfood-test 5. spec-kitty agent tasks status --mission dogfood-test

Assertions:

  • Each command exits 0.
  • No command output contains the substring version followed by a mismatched version string.
  • After the test completes, the dogfood-test mission is cleaned up from /private/tmp/311/spec-kitty/kitty-specs/.

Cross-track integration tests

TX.1 — quickstart.md walkthrough completes

File: tests/integration/test_quickstart_walkthrough.py (NEW) Maps to: All tracks Setup: Same gating as T7.5 (SPEC_KITTY_DOGFOOD_TEST=1). Action: Execute every step in quickstart.md against /private/tmp/311/spec-kitty at the release commit. Assertions:

  • Every step exits as expected (most exit 0; the deliberate-failure steps exit non-zero with the expected error message).
  • The walkthrough completes within 5 minutes.

Aggregate runtime budget

Test categoryEstimated runtime
Track 1 (5 tests)< 5 s
Track 2 (6 tests)< 8 s
Track 3 (6 tests)< 6 s
Track 4 (4 tests)< 2 s
Track 5 (4 tests)< 10 s (concurrent threading + lock waits)
Track 6 (5 tests)< 5 s
Track 7 (4 tests, T7.5 gated)< 5 s (T7.5 + TX.1 are gated)
Total (un-gated, in pre-commit gate)< 41 s — within NFR-004 budget (< 60 s)
TX.1 + T7.5 (CI / dogfood mode only)< 5 minutes

Out-of-scope assertions (PR-review only, no test code)

FRAssertionEnforced by
FR-206No backfill of historical missions' identityPR review
FR-305No full manifest redesign for #525PR review
FR-506No partial fix for #538/#540/#542PR review
NG-1No kitty-specs/** archaeologyPR review + scope-audit (RG-8)
NG-6No SaaS contract surface area addedPR review
C-012No final CHANGELOG.md prose authored by missionPR review