Contracts
cli-contracts.md
CLI Contracts: 3.1.1 Post-555 Release Hardening
Mission: 079-post-555-release-hardening Purpose: Per-track CLI behavior contracts. These are the user-facing command surfaces that the implementation MUST satisfy. Each contract maps to one or more FRs in spec.md.
This is a CLI contract document, not a REST/OpenAPI spec, because this mission's external surface is a CLI tool. The contracts below describe inputs, outputs, exit codes, side effects, and absences (what the command MUST NOT do).
Track 1 — spec-kitty init
Contract C1.1: spec-kitty init <name> --ai <agent> --non-interactive against an empty directory
Maps to: FR-001, FR-002, FR-003, FR-004, FR-005, FR-008
Inputs:
<name>: string, valid filesystem name--ai <agent>: one of the 12 supported agents--non-interactive: present- Working directory: an empty directory the user has cd'd into, OR a parent directory under which
<name>will be created
Side effects (REQUIRED):
- Create
<target_dir>/.kittify/config.yamlwith the selected agent set. - Create
<target_dir>/.kittify/metadata.yamlwith the current spec-kitty version. - Create the per-agent slash-command files under the agent's directory (e.g.,
.codex/prompts/spec-kitty.*.mdfor--ai codex). - Print next-step guidance to stdout that names
spec-kitty next(the loop) andspec-kitty agent action implement/spec-kitty agent action review(the per-decision verbs).
Side effects (FORBIDDEN):
- MUST NOT create
<target_dir>/.git/under any flag combination. There is no--gitopt-in. - MUST NOT run
git init,git add, orgit commitunder any flag combination. - MUST NOT produce any commit, including any commit titled
"Initial commit from Specify template"or any equivalent, under any flag combination. - MUST NOT create
<target_dir>/.agents/skills/or its contents. - MUST NOT name top-level
spec-kitty implementas the canonical implementation entrypoint in any printed text. - MUST NOT accept the
--no-gitflag (the flag is removed in 3.1.1; passing it MUST produce a typer "no such option" error).
Output to stdout:
- A "Next steps" panel that names
spec-kitty next --agent <agent> --mission <slug>as the canonical loop entry, and namesspec-kitty agent action implement/spec-kitty agent action reviewas the per-decision verbs the agent invokes. The panel MUST NOT contain a line that namesspec-kitty implement(the top-level CLI command) as a canonical user-facing command.
Exit code: 0 on success.
Contract C1.2: spec-kitty init re-run in an already-initialized directory
Maps to: FR-006
Inputs: Same as C1.1, but the target directory already contains .kittify/config.yaml from a prior init.
Acceptable behaviors (one of):
- Idempotent: exit 0, produce no changes, print a clear message that the directory is already initialized.
- Fail-fast: exit non-zero with a clear error message naming the conflict (e.g.,
Error: <target_dir>/.kittify/config.yaml already exists. Run \spec-kitty upgrade\to migrate or delete the directory and re-run.).
Forbidden:
- Silent merge or overwrite of existing state.
Contract C1.3: spec-kitty init invoked inside an existing git repository
Maps to: FR-007
Inputs: Same as C1.1, but the working directory (or a parent of it) is inside an existing git working tree.
Side effects (REQUIRED):
- Same as C1.1 — file creation only.
Side effects (FORBIDDEN):
- MUST NOT call
git init,git add,git commit,git checkout, or any git-state-mutating command. - MUST NOT touch the existing git tree's HEAD, branches, or staged state.
- MUST NOT modify
.gitignoreof the parent repo unless the new files would otherwise be untracked and the project explicitly opts into.gitignoremanagement (NOT enabled by default).
Contract C1.4: spec-kitty init --help
Maps to: FR-008
Output to stdout:
- The help text MUST describe the new model accurately:
- No git initialization (and no
--gitopt-in flag) - No automatic commit
- The
--no-gitflag from pre-3.1.1 versions is removed - Next-step guidance names
spec-kitty nextandspec-kitty agent action implement/review
Exit code: 0.
Track 2 — spec-kitty agent mission finalize-tasks (planning-artifact producer)
Contract C2.1: Lane assignment for a feature with planning-artifact WPs
Maps to: FR-101, FR-102, FR-103, FR-104
Inputs:
--mission <slug>: a mission whosetasks.mddeclares at least one planning-artifact WP and at least one code WP.
Behavior:
compute_lanes()MUST return aLanesManifestwhoselaneslist includes a lane withlane_id == "lane-planning"containing all planning-artifact WPs.- The lane assignment for code WPs MUST be unaffected by this change.
LanesManifest.planning_artifact_wpsMAY remain as a derived view (for backward-compat), but MUST be derivable from the lane assignments.
Side effects (REQUIRED):
- Write
kitty-specs/<mission_slug>/lanes.jsoncontaining the new manifest.
Side effects (FORBIDDEN):
- MUST NOT skip planning-artifact WPs from the lane assignment loop.
- MUST NOT call
git worktree addfor thelane-planninglane.
Contract C2.2: spec-kitty implement WP## for a planning-artifact WP
Maps to: FR-105 (consumer uniformity)
Inputs:
WP##: a planning-artifact work package id from a finalize-tasks-completed mission.--mission <slug>: the mission slug.
Behavior:
- The command MUST resolve the WP's lane via the uniform lane lookup (no
if execution_mode == "planning_artifact"branch incontext/resolver.py). - The lane lookup MUST return the
lane-planninglane. - The resolved workspace path MUST be the main repo checkout (
paths.get_main_repo_root()), NOT a.worktrees/...directory. - The command MUST exit 0 with a clear "workspace = main repo checkout" indication.
Side effects (FORBIDDEN):
- MUST NOT create a
.worktrees/<mission_slug>-lane-planningdirectory. - MUST NOT branch on a
planning_artifactWP-type sentinel at the lane-contract boundary.
Track 3 — spec-kitty agent mission create (identity hardening)
Contract C3.1: spec-kitty agent mission create <slug> --json
Maps to: FR-201, FR-204, FR-205
Inputs:
<slug>: a kebab-case unnumbered slug--json: present
Behavior:
- The command MUST mint a ULID
mission_idat creation time. - The command MUST persist
mission_idinkitty-specs/<numbered_slug>/meta.json. - The command MUST NOT use the
get_next_feature_number()scan as the source of canonical identity (it MAY continue to be used to compute the display-friendly numeric prefix). - The output JSON MUST include
mission_idas a top-level field.
Output (stdout, JSON):
{
"result": "success",
"mission_id": "<ulid string>",
"mission_slug": "<###-slug>",
"mission_number": "<###>",
"...": "..."
}
Concurrency:
- Two concurrent invocations of this command from two checkouts of the same repository MUST NOT produce colliding
mission_idvalues. (ULID collision resistance is sufficient — no explicit lock is required for uniqueness, but themeta.jsonwrite SHOULD use the existingfeature_status_lock_pathfor write atomicity.)
Track 4 — spec-kitty agent mission finalize-tasks (parser hotfix)
Contract C4.1: Bounded final WP section
Maps to: FR-301, FR-302, FR-303, FR-304
Inputs:
- A
tasks.mdfile authored by an operator that contains: - A final WP section (
## WP##or## Work Package WP##) - Trailing prose after the final WP section that mentions other WPs in
Depends on WP##form - The final WP carries an explicit
dependencies:declaration in its frontmatter
Behavior:
- The dependency parser MUST bound the final WP section so that the trailing prose is NOT included in the final WP's body.
- The bound MUST trigger at: (a) the next WP header, (b) a top-level
##markdown heading whose text is not a WP id, or (c) EOF. - Sub-headings (
###) inside the WP section MUST NOT trigger the bound. - The explicit
dependencies:declaration MUST be preserved verbatim in the finalized manifest.
Forbidden:
- MUST NOT inject WPs from trailing prose into the final WP's parsed dependencies.
- MUST NOT overwrite explicit
dependencies:declarations with parser-derived values.
Track 5 — Auth refresh
Contract C5.1: refresh_tokens() lock contract
Maps to: FR-401, FR-402, FR-403, FR-404
Behavior (internal — not a user-facing CLI command, but a contract for the function):
auth.refresh_tokens()MUST acquire the cross-processfilelock.FileLockat function entry.- The lock MUST be held for the FULL transaction: read on-disk credentials → HTTP POST to
/token/refresh/→ parse response → persist new credentials (or handle failure). - The lock MUST be released in a
finallyblock. - On 401 from the refresh endpoint:
- The function MUST re-read on-disk credentials under the held lock.
- If the on-disk refresh token differs from the value read at function entry, the 401 is stale: exit cleanly without clearing.
- If the on-disk refresh token is unchanged, the 401 is terminal: clear credentials (under the held lock) and raise
AuthenticationError. - Inner
load()/save()calls inside the locked transaction reacquire the lock as no-ops (reentrancy).
User-facing observable behavior (under contention):
- Two concurrent CLI invocations that race a refresh, where one rotates the refresh token successfully and the other races, MUST result in the user remaining logged in. The losing process MUST observe the rotated token on its re-read and exit cleanly.
Track 6 — Top-level spec-kitty implement and canonical-path teach-out
Contract C6.1: spec-kitty implement --help
Maps to: FR-503
Output to stdout:
- The help text MUST mark
spec-kitty implementas internal infrastructure — an implementation detail ofspec-kitty agent action implement, not a user-facing canonical path. - The help text MUST direct callers to
spec-kitty nextfor the loop andspec-kitty agent action implementfor the per-WP verb.
Example acceptable docstring: > Internal — allocate or reuse the lane worktree for a work package. This is an implementation detail of \spec-kitty agent action implement\ and is not a canonical user-facing command. Users should invoke \spec-kitty next --agent <name> --mission <slug>\ to drive a mission; agents should invoke \spec-kitty agent action implement <WP> --agent <name>\ for per-WP work. This command remains as a compatibility surface for direct callers.
Contract C6.2: spec-kitty implement WP## (compatibility execution)
Maps to: FR-505
Behavior:
- The command MUST still run for direct invokers.
- The command MUST allocate or reuse the lane worktree exactly as it does today.
- The command MUST NOT print a deprecation banner on every invocation (banners on every run are forbidden by R-6 in plan.md).
Exit code: 0 on success, non-zero on error (unchanged behavior).
Contract C6.3: spec-kitty init next-steps output (overlap with Track 1)
Maps to: FR-501, FR-502 (init's printed output) and FR-504 (the slash-command guidance shipped by init)
Output to stdout:
- The next-steps panel MUST NOT name top-level
spec-kitty implementas a canonical user-facing path. - The next-steps panel MUST direct users at the
spec-kitty nextloop and thespec-kitty agent action implement/spec-kitty agent action reviewper-decision verbs (the actual post-#555 canonical user workflow per D-4).
Contract C6.4: README.md canonical workflow text
Maps to: FR-502
Static content rule:
README.mdlines 8-9 (the canonical workflow line) MUST NOT name\implement\`as a step in a way that implies top-levelspec-kitty implement` is the canonical command-line invocation.- The mermaid / ASCII diagram (lines 64-80) MUST NOT name top-level
spec-kitty implementas a step.
Track 7 — Release-hygiene CLI surface
Contract C7.1: scripts/release/validate_release.py in branch mode
Maps to: FR-601, FR-602, FR-606
Behavior:
- The script MUST validate that
pyproject.toml[project].versionequals.kittify/metadata.yamlspec_kitty.version. - If the two versions disagree, the script MUST exit non-zero and emit a clear error message that names both files and both values.
- The script MUST also call
changelog_has_entry(changelog, target_version)and fail the cut if the entry is missing.
Exit code: 0 if all gates pass; non-zero (1 by convention) on any failure.
Error message format (example):
ERROR: Version mismatch
pyproject.toml line 3: 3.1.1
.kittify/metadata.yaml line 6: 3.1.1a3
Both files must report the same version before the release can be cut.
Contract C7.2: spec-kitty agent release prep --channel stable --json
Maps to: FR-605
Behavior:
- The command MUST produce a structured JSON payload representing the proposed release prep.
- The payload MUST include a
proposed_changelog_blockfield whose value is a non-empty markdown block whose header references the target version (e.g., starts with## [3.1.1). - The payload MUST NOT mutate any file in the working tree (it is a draft generator, not a mutator).
Exit code: 0 on success.
Contract C7.3: Dogfood command set against /private/tmp/311/spec-kitty
Maps to: FR-603, FR-604
Behavior:
1. spec-kitty --version 2. spec-kitty init demo --ai codex --non-interactive (against a fresh empty directory under /tmp/) 3. spec-kitty agent mission create dogfood --json (against /private/tmp/311/spec-kitty) 4. spec-kitty agent mission finalize-tasks --mission dogfood (against /private/tmp/311/spec-kitty) 5. spec-kitty agent tasks status --mission dogfood (against /private/tmp/311/spec-kitty)
- A fresh shell, with the CLI installed from the release commit, MUST be able to run each of the following commands and observe exit code 0 with no error rooted in version skew:
Acceptance: All five commands exit 0 and produce no error output naming version mismatch.
Out-of-band contracts
The following contracts are listed for completeness but are NOT enforced as CLI behavior — they are documentation / scope assertions:
- FR-206 (no historical mission identity backfill): asserted via PR review, not via CLI behavior.
- FR-305 (no full manifest redesign): asserted via PR review, not via CLI behavior.
- FR-506 (no #538/#540/#542 stabilization): asserted via PR review, not via CLI behavior.
file-format-contracts.md
File Format Contracts: 3.1.1 Post-555 Release Hardening
Mission: 079-post-555-release-hardening Purpose: Persistent file format expectations the mission must satisfy. Each contract names the file, its current shape, the post-mission shape, and the migration story (if any).
The mission edits a small number of file formats. Most are additive (new fields, no removed fields) so they remain backward-compatible.
F1. meta.json (per-mission identity)
Path: kitty-specs/<mission_slug>/meta.json
Owner: core.mission_creation (write), mission_metadata.resolve_mission_identity (read)
Pre-mission shape (Track 3 baseline)
{
"mission_number": "079",
"slug": "079-post-555-release-hardening",
"mission_slug": "079-post-555-release-hardening",
"friendly_name": "post 555 release hardening",
"mission_type": "software-dev",
"target_branch": "main",
"created_at": "2026-04-09T06:12:24.358591+00:00"
}
Post-mission shape (Track 3 lands)
{
"mission_id": "01HXYZ0123456789ABCDEFGHJK",
"mission_number": "079",
"slug": "079-post-555-release-hardening",
"mission_slug": "079-post-555-release-hardening",
"friendly_name": "post 555 release hardening",
"mission_type": "software-dev",
"target_branch": "main",
"created_at": "2026-04-09T06:12:24.358591+00:00",
"vcs": "git"
}
Changes:
- NEW required field for new missions:
mission_id— string, ULID format (26-character Crockford base32, lexicographically sortable). - NEW recommended field:
vcs— string, default"git". - All existing fields are preserved.
Validation rules:
- For missions created after Track 3 lands:
mission_idMUST be present, MUST be a non-empty string, and MUST parse as a valid ULID. mission_idMUST be immutable. Any code that overwrites it is a contract violation.- For historical missions (created before Track 3 lands):
mission_idMAY be absent. Loaders MUST tolerate the absence (per NG-1 / NG-2). New machine-facing flows MUST treat absence as an error.
Migration:
- Mission 079 itself currently has no
mission_id. Track 3's first WP MUST add it to mission 079's ownmeta.jsonso the mission dogfoods the new identity model. - No bulk migration of historical missions is performed (NG-2).
F2. lanes.json (per-mission lane manifest)
Path: kitty-specs/<mission_slug>/lanes.json
Owner: lanes.persistence.write_lanes_json (write), lanes.persistence.require_lanes_json (read)
Pre-mission shape (Track 2 baseline)
{
"feature_slug": "<mission_slug>",
"mission_id": "<may or may not be present>",
"target_branch": "main",
"lanes": [
{
"lane_id": "lane-a",
"wp_ids": ["WP01", "WP02"],
"write_scope": ["src/foo/**", "tests/foo/**"],
"predicted_surfaces": ["surface-a"],
"depends_on_lanes": [],
"parallel_group": 1
}
],
"planning_artifact_wps": ["WP03"],
"collapse_report": null,
"computed_at": "<timestamp>",
"computed_from": "finalize-tasks"
}
Post-mission shape (Track 2 lands)
{
"feature_slug": "<mission_slug>",
"mission_id": "<from meta.json — Track 3>",
"target_branch": "main",
"lanes": [
{
"lane_id": "lane-a",
"wp_ids": ["WP01", "WP02"],
"write_scope": ["src/foo/**", "tests/foo/**"],
"predicted_surfaces": ["surface-a"],
"depends_on_lanes": [],
"parallel_group": 1
},
{
"lane_id": "lane-planning",
"wp_ids": ["WP03"],
"write_scope": ["kitty-specs/<mission_slug>/**"],
"predicted_surfaces": ["planning"],
"depends_on_lanes": [],
"parallel_group": 0
}
],
"planning_artifact_wps": ["WP03"],
"collapse_report": null,
"computed_at": "<timestamp>",
"computed_from": "finalize-tasks"
}
Changes:
- NEW canonical lane:
lane-planningis added to thelaneslist when the mission has at least one planning-artifact WP. planning_artifact_wpsis preserved as a derived view (for backward compatibility with historicallanes.jsonconsumers). Producers SHOULD treat it as derivable from the lane assignments; consumers SHOULD prefer reading fromlanesdirectly.- Existing
lane-a/lane-bentries are unchanged.
Validation rules:
- For any mission whose
tasks.mddeclares at least one planning-artifact WP,lanes.jsonMUST contain a lane withlane_id == "lane-planning"containing all planning-artifact WP ids. - For any mission with no planning-artifact WPs,
lanes-planningMUST NOT appear (it is conditional on actual planning-artifact WP presence). - Every WP id appearing in any WP frontmatter MUST appear in exactly one lane's
wp_idslist (no orphan WPs after Track 2).
Migration:
- Historical
lanes.jsonfiles written before Track 2 lands continue to be readable. The reader treats absence oflane-planningas "this manifest predates Track 2; the planning-artifact WPs are listed inplanning_artifact_wpsfield instead". This is the only legacy-tolerance hook forlanes.json, and it exists per NG-1. - Re-running
finalize-taskson a historical mission rewrites the manifest with the new shape.
F3. .kittify/metadata.yaml (project metadata)
Path: .kittify/metadata.yaml
Owner: core.project_metadata (write), various readers
Pre-mission shape (Track 7 baseline)
spec_kitty:
version: 3.1.1a2 # ⚠ STALE
initialized_at: <iso8601>
last_upgraded_at: <iso8601>
environment:
python_version: <string>
platform: <string>
platform_version: <string>
migrations:
applied:
- <migration_id>
- ...
Post-mission shape (Track 7 lands)
spec_kitty:
version: 3.1.1 # synced to pyproject.toml at the release cut
initialized_at: <iso8601>
last_upgraded_at: <iso8601>
environment:
python_version: <string>
platform: <string>
platform_version: <string>
migrations:
applied:
- <migration_id>
- ...
Changes:
spec_kitty.versionis bumped to matchpyproject.toml. No schema change.
Validation rules:
- At the release commit,
.kittify/metadata.yaml:spec_kitty.versionMUST equalpyproject.toml:[project].version. - The pre-release validation step MUST fail the cut if the two values disagree.
Migration:
- Track 7's first WP performs the explicit bump. No automation; this is an intentional human/agent commit.
F4. pyproject.toml (Python package version)
Path: pyproject.toml
Owner: human release engineer (with mission 079 facilitating the release-cut WP)
Field of interest
[project]
name = "spec-kitty-cli"
version = "3.1.1" # at the release commit; pre-release alphas use 3.1.1a3, 3.1.1a4, ...
Changes:
- The version field is bumped to
3.1.1(stripping the alpha suffix) at the release cut WP.
Validation rules:
- The pre-release validation step (extended
scripts/release/validate_release.py) MUST assert this field equals.kittify/metadata.yaml:spec_kitty.version. - The validation step MUST also assert that
CHANGELOG.mdcontains an entry whose header matches## [<version>](FR-606).
F5. CHANGELOG.md (release narrative)
Path: CHANGELOG.md
Owner: human release engineer (mission 079 may produce structured draft inputs but does NOT author final prose, per C-012 / FR-605)
Format
Keep a Changelog (https://keepachangelog.com/) + Semantic Versioning. Each entry has the form:
## [VERSION] - DATE
### Added
- ...
### Changed
- ...
### Fixed
- ...
Pre-mission state
The file currently has entries for 3.1.1a3, 3.1.1a2, and 3.1.1a1. No 3.1.1 (stable) entry.
Post-mission state
The human release engineer adds a ## [3.1.1] - <date> entry summarizing the seven tracks of mission 079. Mission 079 itself produces a structured draft input via spec-kitty agent release prep --channel stable --json (FR-605); the human takes that draft and produces the final prose.
Validation rules:
- The pre-release validation step (FR-606) MUST assert that an entry whose header matches
## [3.1.1exists inCHANGELOG.mdand has a non-empty body. - The validation step does NOT validate narrative quality or wording — only presence and basic structural shape.
Mission 079 contract:
- Mission 079 MUST NOT author the final
## [3.1.1]entry prose. - Mission 079 MUST produce a structured draft via the existing
build_release_prep_payload()helper. - Mission 079 MUST add the validation step that fails the cut if the entry is missing.
F6. ~/.spec-kitty/credentials (auth state)
Path: ~/.spec-kitty/credentials (TOML format), with sibling lock file ~/.spec-kitty/credentials.lock.
Owner: sync.auth.CredentialStore
Format (no schema change in this mission)
[tokens]
access = "..."
refresh = "..."
access_expires_at = "2099-01-01T00:00:00+00:00"
refresh_expires_at = "2099-01-01T00:00:00+00:00"
[user]
username = "..."
team_slug = "..." # optional
[server]
url = "https://..."
Changes:
- No schema change. Track 5 changes the lock-scope contract around
refresh_tokens(), not the file format.
Validation rules (post-Track 5):
- The lock file MUST be acquired across the FULL refresh transaction (read → network → persist), not only per-I/O.
- On 401, the refresh function MUST re-read the credentials file under the same lock and compare to the entry-time refresh token before treating the 401 as authoritative grounds for clearing.
- See
contracts/cli-contracts.mdContract C5.1 for the function-level contract.
F7. tasks.md (per-mission work-package narrative)
Path: kitty-specs/<mission_slug>/tasks.md
Owner: human / agent author (write), core.dependency_parser (read)
Format (no schema change in this mission)
tasks.md continues to use the same narrative structure: a ## Plan overview followed by per-WP sections (## WP01, ## WP02, ..., ## WPnn), each with frontmatter (dependencies:, owned_files:, etc.) and prose.
Changes:
- No format change. Track 4 changes the parser bound behavior on the existing format, not the format itself.
Validation rules (post-Track 4):
- The dependency parser MUST bound the final WP section at: (a) the next WP header, (b) a top-level
##heading whose text is not a WP id, or (c) EOF. - Trailing prose past the final WP section MUST NOT be parsed for dependencies of the final WP.
Authoring guidance (post-Track 4):
- Authors MAY add trailing sections (e.g.,
## Notes,## Appendix,## References) after the final WP without risking false-positive dependency inference. - Sub-headings (
###) inside a WP section continue to be preserved.
F8. kitty-specs/<mission_slug>/status.events.jsonl (status event log)
Path: kitty-specs/<mission_slug>/status.events.jsonl
Owner: status.store (write), status.reducer (read)
Format
Each line is a JSON object with sorted keys per the existing 3.0 model. See the project CLAUDE.md "Status Model Patterns" section for the full schema.
Changes:
- No schema change in this mission. The status model is already canonical post-3.0 (feature 060).
- Track 3 may add
mission_idto event payloads emitted byemit_mission_created()(insync/events.py), but the status event log itself stores per-WP events that already usefeature_slugandwp_id. No change needed here.
Validation rules:
- (unchanged)
F9. Slash-command source templates
Path: src/specify_cli/missions/software-dev/command-templates/{specify,plan,tasks,tasks-packages,implement}.md
Owner: mission template authors (mission 079 edits these as part of Track 6)
Format
Markdown files with <!-- spec-kitty-command-version: ... --> headers and templated body content. The CLI copies these into per-agent directories during init.
Changes:
- Track 6 edits the body content of these files to:
- Remove top-level
spec-kitty implementas the canonical CLI invocation in user-facing examples. - Replace inline CLI invocations with the slash-command equivalent.
- The
<!-- spec-kitty-command-version: ... -->headers are preserved.
Validation rules:
- Per the project
CLAUDE.md: edit SOURCE templates only, NOT generated agent copies under.claude/,.codex/, etc. - The migration mechanism that deploys updated templates to existing projects (
upgrade/migrations/) MUST pick up the changes. Track 6's tests verify this end-to-end.
Cross-file invariants
| Invariant | Files involved | Enforced by |
|---|---|---|
pyproject.toml version == .kittify/metadata.yaml version | pyproject.toml, .kittify/metadata.yaml | scripts/release/validate_release.py (Track 7 extension) |
CHANGELOG.md has entry for pyproject.toml version | pyproject.toml, CHANGELOG.md | scripts/release/validate_release.py (existing changelog_has_entry, ensured to run in branch mode) |
Every WP in tasks.md has a lane assignment in lanes.json | tasks.md, lanes.json | compute_lanes() (Track 2) |
meta.json:mission_id exists for every mission created after Track 3 lands | meta.json | core.mission_creation (Track 3) + acceptance test |
test-contracts.md
Test Contracts: 3.1.1 Post-555 Release Hardening
Mission: 079-post-555-release-hardening Purpose: Regression test scenarios that the implementation MUST satisfy. Each test contract maps to one or more FRs and is the acceptance gate for its track.
This document is the contract layer. The actual test code lives in tests/ (per the file paths in plan.md §4 and §8). Reviewers and the /spec-kitty.review command use this document as the canonical "did the implementation satisfy the spec?" gate.
Conventions
- Test IDs are
T<track>.<n>where<track>is 1-7 and<n>is the test ordinal within that track. - All tests MUST run under
PWHEADLESS=1 pytest tests/and complete within the NFR-004 budget (< 60 s aggregate for new tests added by this mission). - All tests MUST be deterministic. Concurrent / race tests use explicit synchronization primitives; no
time.sleepbased races. - All file system tests MUST use
tmp_pathfixtures from pytest; they MUST NOT touch the working repo or the user's home directory.
Track 1 — init coherence
T1.1 — init does not create .git/
File: tests/init/test_init_minimal_integration.py (extension) Maps to: FR-001 Setup: An empty tmp_path. Action: Run spec-kitty init demo --ai codex --non-interactive against tmp_path. Assertions:
(tmp_path / "demo" / ".git").exists() == False(tmp_path / ".git").exists() == False(in case the target was tmp_path itself)
T1.2 — init does not produce a commit
File: tests/init/test_init_minimal_integration.py (extension) Maps to: FR-002 Setup: An empty tmp_path. Action: Run spec-kitty init demo --ai codex --non-interactive. Assertions:
subprocess.run(["git", "log"], cwd=tmp_path/"demo", capture_output=True).returncode != 0(no git repo) OR- The literal string
"Initial commit from Specify template"does NOT appear anywhere undertmp_path/"demo"(grep across the directory).
T1.3 — init does not create .agents/skills/
File: tests/init/test_init_minimal_integration.py (extension) Maps to: FR-003 Action: Run spec-kitty init demo --ai codex --non-interactive. Assertions:
(tmp_path / "demo" / ".agents" / "skills").exists() == False
T1.4 — init next-steps does not name top-level spec-kitty implement
File: tests/init/test_init_next_steps.py (NEW) Maps to: FR-004, FR-501 Action: Capture stdout from spec-kitty init demo --ai codex --non-interactive. Assertions:
- The captured stdout does NOT contain a line that names
spec-kitty implement(the top-level CLI invocation) as a canonical implementation step. - The captured stdout DOES name
spec-kitty nextas the canonical loop entry andspec-kitty agent action implement/spec-kitty agent action reviewas the per-decision verbs. - Slash-command file names like
/spec-kitty.implementMAY appear in the output (they refer to slash commands surfaced in the agent runtime, not top-level CLI invocations). What is FORBIDDEN is the literal stringspec-kitty implement WPor any prose teaching top-levelspec-kitty implementas a canonical user-facing CLI invocation. - The captured stdout DOES NOT contain
Initial commit from Specify template.
T1.5 — init is idempotent on re-run
File: tests/init/test_init_idempotent.py (NEW) Maps to: FR-006 Setup: A tmp_path that has already been initialized once. Action: Run spec-kitty init demo --ai codex --non-interactive a second time. Assertions: One of:
- Exit code 0, file set unchanged, message "already initialized" or equivalent. OR
- Exit code != 0, error message names the conflict.
T1.6 — init does not touch existing git state
File: tests/init/test_init_in_existing_repo.py (NEW) Maps to: FR-007 Setup: A tmp_path initialized as a git repo with one user-authored commit. Action: Run spec-kitty init demo --ai codex --non-interactive against the same path. Assertions:
- The git HEAD commit hash is unchanged before vs. after.
- The git branch is unchanged.
git status --porcelainshows only the new files created by init (no modified existing files).- No new commits exist in the git log.
T1.7 — init --help describes the new model
File: tests/init/test_init_help.py (NEW or extend) Maps to: FR-008 Action: Capture stdout from spec-kitty init --help. Assertions:
- Help text mentions "no automatic git initialization" (or equivalent phrasing).
- Help text mentions "no automatic commit" (or equivalent).
- Help text names
spec-kitty nextandspec-kitty agent action implement/reviewas the canonical user-facing path. - Help text confirms the
--no-gitflag is no longer present (P1: removed in 3.1.1).
Track 2 — Planning-artifact producer correctness
T2.1 — compute_lanes includes planning-artifact WPs
File: tests/lanes/test_compute_planning_artifact.py (NEW) Maps to: FR-101 Setup: A test fixture mission with one code WP and one planning-artifact WP. Build the dependency graph and ownership manifests. Action: Call compute_lanes(dependency_graph, ownership_manifests, mission_slug, target_branch, ...). Assertions:
- The returned
LanesManifest.laneslist contains a lane whoselane_id == "lane-planning". - That lane's
wp_idsincludes the planning-artifact WP.
T2.2 — Planning lane has canonical id
File: tests/lanes/test_compute_planning_artifact.py (NEW) Maps to: FR-102 Action: Same as T2.1. Assertions:
- The lane id is exactly
"lane-planning"(canonical, not derived from the mission slug).
T2.3 — lane_branch_name for lane-planning returns the planning branch
File: tests/lanes/test_branch_naming_planning.py (NEW) Maps to: FR-103 Action: Call lane_branch_name("079-post-555-release-hardening", "lane-planning"). Assertions:
- The returned string equals the mission's
target_branchvalue (e.g.,"main"), NOT"kitty/mission-079-post-555-release-hardening-lane-planning".
T2.4 — Resolver returns coherent ref for planning-artifact WP via lane lookup
File: tests/context/test_resolver_planning_artifact.py (NEW) Maps to: FR-104, FR-105 Setup: A mission with at least one planning-artifact WP, lanes computed. Action: Call context.resolver.resolve_authoritative_ref(feature_dir, mission_slug, planning_wp_id). Assertions:
- The returned ref is the planning branch (e.g.,
"main"). - The function does NOT raise
MissingIdentityError. - The function does NOT branch on
execution_mode == "planning_artifact"at the lane lookup site (verifiable via code review or by injecting a mock that asserts the call path).
T2.5 — implement dispatch resolves planning-artifact WP to main repo checkout
File: tests/agent/cli/commands/test_implement_planning_artifact.py (NEW) Maps to: FR-105 Setup: A mission with at least one planning-artifact WP, lanes computed. Action: Invoke spec-kitty implement <planning_wp_id> --mission <slug>. Assertions:
- The resolved workspace path is
paths.get_main_repo_root(working_dir), NOT.worktrees/<slug>-lane-planning. - No
.worktrees/<slug>-lane-planningdirectory was created. - Exit code 0.
T2.6 — Code WPs continue to receive normal lane assignments
File: tests/lanes/test_compute_planning_artifact.py (NEW) Maps to: FR-106 (regression coverage for the unchanged path) Setup: Same as T2.1. Action: Same as T2.1. Assertions:
- The code WP receives a
lane-a/lane-bstyle lane id (NOTlane-planning). - The code WP's lane has a non-empty
write_scope.
Track 3 — Mission identity Phase 1
T3.1 — mission_id is minted at creation
File: tests/core/test_mission_creation_identity.py (NEW) Maps to: FR-201 Action: Call core.mission_creation.create_mission_core(repo_root, "test-identity") against a tmp_path repo. Assertions:
- Read
<feature_dir>/meta.json. meta["mission_id"]exists, is a string, is non-empty.- The string parses as a valid ULID via
ulid.ULID.from_str(meta["mission_id"]).
T3.2 — mission_id does not depend on numeric prefix scan
File: tests/core/test_mission_creation_identity.py (NEW) Maps to: FR-204 Setup: A tmp_path repo with two existing missions at kitty-specs/001-foo/ and kitty-specs/099-bar/. Action: Call core.mission_creation.create_mission_core(tmp_path, "new-mission"). Assertions:
- The new mission's
mission_idis a valid ULID. - The
mission_idvalue does NOT depend on the numeric prefix value (i.e., creating the same mission name in a different repo with different prefixes yields a differentmission_id). - The numeric prefix is
"100"(display-friendly, max+1) — this confirmsget_next_feature_number()still works for display, but themission_idis independently generated.
T3.3 — Concurrent mission creation does not collide
File: tests/core/test_mission_creation_concurrent.py (NEW) Maps to: FR-205 Setup: Two independent tmp_path repos OR two threads operating on the same repo with different slugs. Action: Spawn two threads that each call create_mission_core with distinct slugs. Assertions:
- Both
meta.jsonfiles exist. - Both have
mission_idvalues. - The two
mission_idvalues are different. - Both threads exit cleanly (no exceptions, no truncated writes).
T3.4 — MissionIdentity exposes mission_id
File: tests/mission_metadata/test_mission_identity_includes_id.py (NEW or extend existing) Maps to: FR-202 Setup: A mission meta.json with mission_id. Action: Call mission_metadata.resolve_mission_identity(feature_dir). Assertions:
- The returned
MissionIdentityhas amission_idattribute. - The attribute equals the value from
meta.json.
T3.5 — MissionIdentity tolerates legacy missions without mission_id
File: tests/mission_metadata/test_mission_identity_legacy.py (NEW or extend) Maps to: NG-2 (legacy tolerance for display only) Setup: A mission meta.json WITHOUT mission_id (simulating a historical mission). Action: Call mission_metadata.resolve_mission_identity(feature_dir). Assertions:
- The returned
MissionIdentityhasmission_id is None. - Other fields (
mission_slug,mission_number,mission_type) are populated normally.
T3.6 — emit_mission_created payload includes mission_id
File: tests/sync/test_emit_mission_created_includes_mission_id.py (NEW) Maps to: FR-202 (machine-facing flows identify by mission_id) Action: Call sync.events.emit_mission_created(mission_slug=..., mission_number=..., mission_id=..., ...). Assertions:
- The emitted payload (JSON) contains
mission_idas a top-level field. - The value matches the input.
Track 4 — Tasks/finalize hotfix
T4.1 — Final WP section bounds at top-level non-WP heading
File: tests/core/test_dependency_parser.py (extension) Maps to: FR-301 Setup:
## WP01
**Dependencies**: WP00
Body of WP01.
## WP02
**Dependencies**: []
Body of WP02.
## Notes
This phase depends on WP01 being complete and signed off. Depends on WP01.
Action: Call parse_dependencies_from_tasks_md(content). Assertions:
result == {"WP01": ["WP00"], "WP02": []}- Specifically:
"WP02": []— the trailing prose under## Notesis NOT parsed into WP02's dependencies.
T4.2 — Trailing prose without a ## heading still bounds at EOF
File: tests/core/test_dependency_parser.py (extension) Maps to: FR-301 (edge case) Setup:
## WP01
**Dependencies**: []
Body of WP01.
This is an unstructured trailing paragraph that mentions WP02 informally.
Action: Call parse_dependencies_from_tasks_md(content). Expected behavior: The current behavior is that trailing prose after the final WP without a ## heading is parsed (this is the underlying #525 issue). For Track 4's narrow slice, the chosen design is to bound at ## headings only — trailing free-form prose remains a known issue documented in research.md rejected alternatives. This test is a NEGATIVE assertion: it documents the known limitation and asserts the parser correctly identifies WP02 in this edge case OR returns [] if the parser otherwise improves. The test name should make this clear. Assertion: The test passes if result["WP01"] is either [] or ["WP02"]. The point is to have the test exist as a tripwire — if a future change tightens the bound, this test catches the regression in the other direction.
(Reviewer note: T4.2 is a deliberate documentation test for the known #525 boundary; the strict bound is FR-301, which T4.1 covers. Discuss with the reviewer if T4.2 should be removed for clarity — keeping it preserves the test as a sentinel for the next iteration.)
T4.3 — Sub-headings inside a WP section do NOT trigger the bound
File: tests/core/test_dependency_parser.py (extension) Maps to: FR-301 (edge case) Setup:
## WP01
**Dependencies**: WP00
### Implementation notes
Some implementation notes here.
### Test plan
- Depends on WP02
## WP02
Body.
Action: Call parse_dependencies_from_tasks_md(content). Assertions:
result["WP01"]includes bothWP00ANDWP02(the bullet-list under### Test planis inside WP01's section because###does not trigger the bound).- This validates that sub-headings within a WP section are preserved.
T4.4 — Explicit dependencies: declaration is preserved verbatim
File: tests/core/test_dependency_parser.py (extension) Maps to: FR-302, FR-303 Setup: A tasks.md where WP02 has explicit dependencies: [] in frontmatter and no inline Depends on text. Action: Run finalize-tasks against the file. Assertions:
- The finalized manifest's WP02 has
dependencies: []. - The disagree-loud check did not fire (no error).
Track 5 — Auth refresh race fix
T5.1 — refresh_tokens holds the lock for the full transaction
File: tests/sync/test_auth_concurrent_refresh.py (NEW) Maps to: FR-401 Action: Patch the HTTP client to introduce an artificial delay; in a second thread, attempt to acquire the same FileLock and assert it blocks until the refresh completes. Assertions:
- The second thread's lock acquisition blocks for at least the duration of the network delay.
- After the refresh completes, the second thread acquires the lock and proceeds.
T5.2 — Stale 401 does not clear credentials
File: tests/sync/test_auth_concurrent_refresh.py (NEW) Maps to: FR-402, FR-403 Setup: Two threads. Thread A starts a refresh (mocked HTTP returns 200 with new tokens). Thread B starts a refresh (mocked HTTP returns 401) where, between B's lock release at function entry and B's network call return, A has already rotated the token on disk. Action: Both threads run. Assertions:
- Thread A: completes successfully, on-disk token is the new rotated value.
- Thread B: detects the on-disk token has changed since function entry, exits cleanly without clearing.
- After both threads complete, the credentials file still exists and contains Thread A's rotated tokens.
T5.3 — Real (non-stale) 401 still clears credentials
File: tests/sync/test_auth_concurrent_refresh.py (NEW) Maps to: FR-403 Setup: A single thread. Mocked HTTP returns 401. No concurrent rotation occurs. Action: Call refresh_tokens(). Assertions:
- The function reads on-disk credentials, finds them unchanged from function entry, treats the 401 as authoritative, clears credentials.
- Raises
AuthenticationError. - The credentials file no longer exists.
T5.4 — Reentrancy: inner load/save calls are no-op lock acquisitions
File: tests/sync/test_auth_concurrent_refresh.py (NEW) Maps to: FR-401 Action: Mock the lock to count acquisitions; call refresh_tokens(). Assertions:
- The lock is acquired exactly once at function entry by the same thread.
- Inner
load()/save()calls do NOT cause additional cross-process lock acquisitions (they reacquire the in-memory thread-local lock state without blocking).
Track 6 — Top-level implement de-emphasis
T6.1 — spec-kitty implement --help marks compatibility surface
File: tests/agent/cli/commands/test_implement_help.py (NEW) Maps to: FR-503 Action: Capture stdout from spec-kitty implement --help. Assertions:
- The text contains the phrase
"internal infrastructure"or"implementation detail"(marking the command as not part of the canonical user-facing path). - The text contains a literal reference to
spec-kitty next(the canonical loop) AND a literal reference tospec-kitty agent action implement(the canonical per-WP verb). - The text MAY also describe the command as a "compatibility surface" for direct invokers, but the primary framing MUST be "internal infrastructure".
T6.2 — spec-kitty implement still runs
File: tests/agent/cli/commands/test_implement_runs.py (extend existing or NEW) Maps to: FR-505 Setup: A mission with finalized lanes containing at least one code WP. Action: Invoke spec-kitty implement WP01 --mission <slug>. Assertions:
- Exit code 0.
- The lane worktree was allocated or reused.
- The command produced its expected output (JSON or human, per mode).
T6.3 — init next-steps does not name top-level spec-kitty implement
(Same as T1.4 — overlap with Track 1.)
T6.4 — README.md does not name top-level spec-kitty implement in canonical workflow
File: tests/docs/test_readme_canonical_path.py (NEW) Maps to: FR-502 Setup: Read README.md from the working repo. Assertions:
- The string
\implement\`` does NOT appear in the canonical workflow line at lines ~8-9 (or wherever the canonical workflow line ends up after the rewrite). - The mermaid / ASCII diagram (lines ~64-80) does NOT name
spec-kitty implementas a step. - A more semantic check: the README's "getting started" section refers users to
spec-kitty next(the loop) andspec-kitty agent action implement/spec-kitty agent action review(the per-decision verbs), not to top-levelspec-kitty implement.
T6.5 — Slash-command source templates do not teach top-level spec-kitty implement
File: tests/missions/test_command_templates_canonical_path.py (NEW) Maps to: FR-504 Setup: Read each file under src/specify_cli/missions/software-dev/command-templates/. Assertions:
- The literal string
spec-kitty implement WP(top-level CLI invocation form) does NOT appear intasks.md,tasks-packages.md,specify.md,plan.md, orimplement.mdas a canonical user-facing example. - Where the templates need to reference the implement step, they use
spec-kitty agent action implement <WP> --agent <name>(the agent-facing wrapper that handles workspace creation internally) and/orspec-kitty next --agent <name> --mission <slug>(the loop entry). - The slash-command file
/spec-kitty.implementMAY remain as a slash command, but its body MUST resolve tospec-kitty agent action implementinvocation, not to the top-levelspec-kitty implementinvocation.
Track 7 — Repo dogfood / version coherence
T7.1 — validate_release.py fails on metadata-yaml ↔ pyproject mismatch
File: tests/release/test_validate_metadata_yaml_sync.py (NEW) Maps to: FR-601, FR-602 Setup: A tmp_path repo with pyproject.toml (version = "3.1.1") and .kittify/metadata.yaml (spec_kitty.version: 3.1.1a3). Action: Run python scripts/release/validate_release.py against the temp repo. Assertions:
- Exit code != 0.
- stderr or stdout contains both file paths and both version values in the error message.
T7.2 — validate_release.py passes when versions match
File: tests/release/test_validate_metadata_yaml_sync.py (NEW) Maps to: FR-601 Setup: A tmp_path repo with both files reporting 3.1.1, plus a CHANGELOG.md that has a ## [3.1.1] entry. Action: Run python scripts/release/validate_release.py. Assertions:
- Exit code 0.
T7.3 — validate_release.py fails when CHANGELOG entry is missing
File: tests/release/test_validate_changelog_entry.py (NEW or extend existing) Maps to: FR-606 Setup: A tmp_path repo with matching versions but a CHANGELOG.md that has NO ## [3.1.1] entry. Action: Run python scripts/release/validate_release.py in branch mode. Assertions:
- Exit code != 0.
- The error message names the missing entry.
T7.4 — build_release_prep_payload produces a valid draft
File: tests/release/test_release_payload_draft.py (NEW) Maps to: FR-605 Setup: A tmp_path repo with at least one accepted WP under kitty-specs/<some-mission>/tasks/done/. Action: Call build_release_prep_payload(channel="stable", repo_root=tmp_path). Assertions:
- The returned payload is a dict.
- The payload has a
proposed_changelog_blockkey. - The value of
proposed_changelog_blockis a non-empty string. - The string contains a header that references a stable version (e.g., starts with
## [3.1.1).
T7.5 — Dogfood command set runs cleanly
File: tests/release/test_dogfood_command_set.py (NEW) Maps to: FR-603, FR-604 Setup: Use the working repo path (/private/tmp/311/spec-kitty) at the release commit. The test is gated on os.environ.get("SPEC_KITTY_DOGFOOD_TEST") == "1" so it runs only in CI / explicit dogfood mode (it touches the real filesystem). Action: Invoke each command from the dogfood set: 1. spec-kitty --version 2. spec-kitty init demo --ai codex --non-interactive (in a tmp_path) 3. spec-kitty agent mission create dogfood-test --json (in /private/tmp/311/spec-kitty) 4. spec-kitty agent mission finalize-tasks --mission dogfood-test 5. spec-kitty agent tasks status --mission dogfood-test
Assertions:
- Each command exits 0.
- No command output contains the substring
versionfollowed by a mismatched version string. - After the test completes, the
dogfood-testmission is cleaned up from/private/tmp/311/spec-kitty/kitty-specs/.
Cross-track integration tests
TX.1 — quickstart.md walkthrough completes
File: tests/integration/test_quickstart_walkthrough.py (NEW) Maps to: All tracks Setup: Same gating as T7.5 (SPEC_KITTY_DOGFOOD_TEST=1). Action: Execute every step in quickstart.md against /private/tmp/311/spec-kitty at the release commit. Assertions:
- Every step exits as expected (most exit 0; the deliberate-failure steps exit non-zero with the expected error message).
- The walkthrough completes within 5 minutes.
Aggregate runtime budget
| Test category | Estimated runtime |
|---|---|
| Track 1 (5 tests) | < 5 s |
| Track 2 (6 tests) | < 8 s |
| Track 3 (6 tests) | < 6 s |
| Track 4 (4 tests) | < 2 s |
| Track 5 (4 tests) | < 10 s (concurrent threading + lock waits) |
| Track 6 (5 tests) | < 5 s |
| Track 7 (4 tests, T7.5 gated) | < 5 s (T7.5 + TX.1 are gated) |
| Total (un-gated, in pre-commit gate) | < 41 s — within NFR-004 budget (< 60 s) |
| TX.1 + T7.5 (CI / dogfood mode only) | < 5 minutes |
Out-of-scope assertions (PR-review only, no test code)
| FR | Assertion | Enforced by |
|---|---|---|
| FR-206 | No backfill of historical missions' identity | PR review |
| FR-305 | No full manifest redesign for #525 | PR review |
| FR-506 | No partial fix for #538/#540/#542 | PR review |
| NG-1 | No kitty-specs/** archaeology | PR review + scope-audit (RG-8) |
| NG-6 | No SaaS contract surface area added | PR review |
| C-012 | No final CHANGELOG.md prose authored by mission | PR review |