Spec Kitty

└─ kitty-specs
   └─ Post-Merge Reliability And Release Hardening

Mission Run:

📚 Docs ↗

Contracts

diff_coverage_policy.md

Contract: WP03 Diff-Coverage Policy Validation

Owns: FR-010, FR-011, FR-012

Verification-first protocol

WP03 is verification-first. The order is locked:

1. FR-010 runs first: a written validation report against current ci-quality.yml behavior on a representative large PR sample. No code changes yet. 2. After the validation report, a fork:

FR-011 fires if validation shows current main already satisfies the policy intent → close #455 with evidence, tighten docs/messages, no workflow change.
FR-012 fires if validation shows residual mismatch → adjust the workflow so only the intended critical-path surface produces hard failures.

WP03 SHALL NOT modify .github/workflows/ci-quality.yml before authoring the validation report.

Validation report (FR-010)

File: kitty-specs/068-post-merge-reliability-and-release-hardening/wp03-validation-report.md

The report contains the DiffCoverageValidationReport dataclass rendered as markdown:

# WP03 Diff-Coverage Validation Report

**Validated at commit**: `<sha>`
**Workflow path**: `.github/workflows/ci-quality.yml`
**Sample PR**: #<id> — <description>

## Critical-path threshold (enforced)
- Threshold: <X>%
- Current behavior: hard-fails if diff coverage on critical-path files < <X>%
- Critical-path file list source: `<config-path-or-pattern>`

## Full-diff threshold (advisory)
- Threshold: <Y>%
- Current behavior: emits a separate report; never hard-fails

## Findings
- [ ] Critical-path enforce/advisory split is correctly implemented
- [ ] Hard-fail surfaces match the intended critical-path file set
- [ ] Advisory report is clearly labeled as advisory in CI output
- [ ] Large PRs that meet critical-path coverage but miss full-diff coverage pass the build

## Decision
**[ ] close_with_evidence** — current main already satisfies the policy intent. Issue #455 closed with link to this report.
**[ ] tighten_workflow** — residual mismatch found. Workflow adjusted per FR-012.

## Rationale
<one paragraph>

FR-011 path: close with evidence

If the validation report's decision is close_with_evidence:

1. WP03 closes issue #455 with a comment linking to:

2. WP03 tightens documentation and CI output messages so future contributors immediately understand which surface is enforced and which is advisory:

3. No workflow logic changes.

This validation report
The current ci-quality.yml line ranges that implement the enforce/advisory split
The example large PR that passes correctly under the current policy
Add a "Diff coverage policy" section to docs/explanation/ (or wherever CI policy is documented)
Update CI step names in ci-quality.yml to be self-explanatory (e.g., "diff-coverage (critical-path, enforced)" vs "diff-coverage (full-diff, advisory)")

FR-012 path: tighten workflow

If the validation report's decision is tighten_workflow:

1. WP03 modifies .github/workflows/ci-quality.yml so:

2. WP03 adds an integration test (or a curated synthetic PR test) demonstrating that a large PR meeting critical-path coverage but missing full-diff coverage now passes. 3. WP03 closes issue #455 with a comment linking to the workflow diff and the new test.

The hard-fail surface is exactly the intended critical-path file set
Full-diff coverage runs as advisory only
CI output identifies enforced vs advisory surfaces explicitly

Test surface

Test	FR	Asserts
`test_validation_report_authored`	FR-010	the validation report file exists and has all required sections
`test_decision_is_recorded`	FR-010	the report has exactly one of `close_with_evidence` or `tighten_workflow` checked
`test_validation_report_close_path_populated`	FR-010, FR-011	when `decision == close_with_evidence`, the report has a non-empty rationale (≥ 50 chars) AND either an empty findings list (no policy mismatches) or each finding carries an explicit "satisfied by" rationale. This is the content gate that prevents WP03 from shipping a vacuous report.
`test_close_with_evidence_does_not_modify_workflow`	FR-011	if the decision is `close_with_evidence`, `git diff main -- .github/workflows/ci-quality.yml` is empty
`test_tighten_workflow_passes_large_pr_sample`	FR-012	(only if FR-012 fires) a synthetic large PR meeting critical-path but missing full-diff passes

NFR-006 interaction

NFR-006 is pinned to commit 7307389a. If WP03 takes the FR-012 path and changes the threshold or include-list, NFR-006 is re-evaluated against the post-WP03 threshold rather than blocking WP03's change. This carve-out is documented in NFR-006's body in spec.md.

merge_strategy.md

Contract: WP02 Merge Strategy + Status-Events Safe Commit

Owns: FR-005, FR-006, FR-007, FR-008, FR-009, FR-019, FR-020 + NFR-003

CLI surface (FR-005, FR-006)

File: src/specify_cli/cli/commands/merge.py

The existing --strategy typer parameter is currently declared but discarded before reaching the lane-merge implementation. WP02 wires it through.

import typer
from specify_cli.lanes.merge import run_lane_based_merge
from specify_cli.config import load_merge_config
from specify_cli.lanes.merge import MergeStrategy

@app.command()
def merge(
    feature: str = typer.Option(None, "--feature"),
    strategy: Optional[MergeStrategy] = typer.Option(
        None,
        "--strategy",
        help="Merge strategy: merge | squash | rebase. Default: squash for mission→target.",
    ),
    resume: bool = typer.Option(False, "--resume"),
    abort: bool = typer.Option(False, "--abort"),
    dry_run: bool = typer.Option(False, "--dry-run"),
) -> None:
    """Run a feature merge with explicit strategy support."""

    # Resolution order: CLI flag > config > default(SQUASH)
    resolved_strategy = (
        strategy
        or load_merge_config(repo_root).strategy
        or MergeStrategy.SQUASH
    )

    run_lane_based_merge(
        feature=feature,
        repo_root=repo_root,
        strategy=resolved_strategy,
        ...,
    )

Key contract: --strategy is no longer silently discarded. The flag value flows from the CLI into _run_lane_based_merge and determines the git command sequence used for the mission→target step.

Lane→mission semantics (FR-007)

Lane→mission merges retain their existing merge-commit behavior. They are local, never hit branch protection, and are valuable as preserved lane structure on the mission branch.

Implementation: the strategy parameter passed to run_lane_based_merge applies ONLY to the final mission→target step. The internal _merge_lane_into_mission helper continues to use git merge (no-fast-forward) regardless of the strategy parameter.

Project config (FR-008)

File: .kittify/config.yaml (existing file, new merge section)

merge:
  strategy: squash    # one of: merge | squash | rebase

Module: src/specify_cli/config.py (existing) gains a MergeConfig accessor:

from dataclasses import dataclass
from pathlib import Path
from specify_cli.lanes.merge import MergeStrategy

@dataclass
class MergeConfig:
    strategy: MergeStrategy | None = None

def load_merge_config(repo_root: Path) -> MergeConfig:
    """Read .kittify/config.yaml and return the merge section."""
    ...

Validation: if merge.strategy is present but not one of the three allowed values, load_merge_config raises a startup error (not silent fallback).

Push-error parser (FR-009)

Module: src/specify_cli/cli/commands/merge.py (new helper)

LINEAR_HISTORY_REJECTION_TOKENS: tuple[str, ...] = (
    "merge commits",
    "linear history",
    "fast-forward only",
    "GH006",
    "non-fast-forward",
)

def _is_linear_history_rejection(stderr: str) -> bool:
    """Return True if git push stderr indicates a linear-history rejection."""
    haystack = stderr.lower()
    return any(token.lower() in haystack for token in LINEAR_HISTORY_REJECTION_TOKENS)

def _emit_remediation_hint(console: Console) -> None:
    console.print(
        "\n[yellow]Push rejected by linear-history protection.[/yellow]\n"
        "Try [cyan]spec-kitty merge --strategy squash[/cyan], or set "
        "[cyan]merge.strategy: squash[/cyan] in [cyan].kittify/config.yaml[/cyan].\n"
    )

Fail-open rule: if stderr does not match any token, NO hint is emitted. This prevents misleading hints on unrelated push failures.

Backstop role: with squash as the default (FR-006), this parser is a backstop for users who explicitly opt into --strategy merge. It is NOT expected to fire on the default path.

Status-events safe_commit fix (FR-019)

File: src/specify_cli/cli/commands/merge.py inside _run_lane_based_merge

Insertion point: after the per-WP _mark_wp_merged_done loop and before the worktree-removal step.

from specify_cli.git import safe_commit

# ... (existing _mark_wp_merged_done loop) ...

# FR-019: Persist the done events to git so they survive any subsequent
# external merge rebuild (e.g., reset+squash for protected linear-history).
safe_commit(
    repo_path=main_repo,
    files_to_commit=[
        feature_dir / "status.events.jsonl",
        feature_dir / "status.json",
    ],
    commit_message=f"chore({mission_slug}): record done transitions for merged WPs",
    allow_empty=False,
)

# ... (existing worktree-removal step) ...

Out of scope (per spec "Scope (preempting 'what about MergeState?')" subsection):

.kittify/runtime/merge/<mission_id>/state.json — intentionally ephemeral
The cleanup_merge_workspace/clear_state calls at the end — runtime state, not the cause of the loss

Regression test (FR-020)

File: tests/cli/commands/test_merge_status_commit.py

import subprocess
from pathlib import Path
from specify_cli.cli.commands.merge import _run_lane_based_merge

def test_done_events_committed_to_git(synthetic_mission_repo):
    """FR-020: after _run_lane_based_merge returns, the done events for every
    merged WP are present in git history at HEAD, not just on disk."""
    repo, mission_slug, wps = synthetic_mission_repo  # fixture creates 2+ WPs

    # Run the merge end-to-end
    _run_lane_based_merge(...)

    # The proof: read status.events.jsonl from git, not from the working tree
    result = subprocess.run(
        ["git", "show", f"HEAD:kitty-specs/{mission_slug}/status.events.jsonl"],
        cwd=repo,
        capture_output=True,
        text=True,
        check=True,
    )
    events = [json.loads(line) for line in result.stdout.splitlines() if line.strip()]
    done_wps = {e["wp_id"] for e in events if e["to_lane"] == "done"}

    assert done_wps == set(wps), (
        f"Expected done events for every merged WP. "
        f"Got {done_wps}, expected {set(wps)}. "
        f"This regression would mean FR-019's safe_commit step was missed."
    )

This test proves FR-019's contract directly: events are durably committed at the time the merge command returns. It does NOT use git reset --hard HEAD because that's a no-op for a file that's already at HEAD (the previous draft of FR-020 had this logical hole; the simpler direct assertion is mechanically correct).

Test surface

Test	FR / NFR	Asserts
`test_strategy_flag_flows_through`	FR-005	`--strategy squash` passed to CLI reaches `_run_lane_based_merge`
`test_default_strategy_is_squash`	FR-006	no flag, no config → squash applied
`test_lane_to_mission_uses_merge_commit`	FR-007	`--strategy squash` does NOT change lane→mission semantics
`test_config_yaml_strategy_honored`	FR-008	`merge.strategy: rebase` in config produces a rebase merge
`test_invalid_config_strategy_raises`	FR-008	`merge.strategy: bogus` raises a startup error, not silent fallback
`test_push_rejection_emits_hint_for_known_tokens`	FR-009	each of the 5 token strings triggers the remediation hint
`test_push_rejection_fails_open_for_unknown`	FR-009	unrelated stderr does NOT emit a hint
`test_done_events_committed_to_git`	FR-019, FR-020	end-to-end FR-020 regression
`test_protected_linear_history_succeeds_default`	NFR-003	squash default succeeds against `require_linear_history = true` integration test

recovery_extension.md

Contract: WP05 Recovery Extension + Verification + Mission Close Ledger

Owns: FR-016, FR-017, FR-018, FR-021

scan_recovery_state extension (FR-021)

File: src/specify_cli/lanes/recovery.py (existing function, lines 174-267)

Current behavior

scan_recovery_state(repo_root, mission_slug) iterates branches matching kitty/mission-{slug}* returned by _list_mission_branches. If no live branches exist (because they were merged-and-deleted), the function returns "nothing to recover" — leaving the user with an unblockable workflow.

New behavior (FR-021)

from pathlib import Path
from specify_cli.lanes.recovery import scan_recovery_state, RecoveryState
from specify_cli.status.reducer import materialize

def scan_recovery_state(
    repo_root: Path,
    mission_slug: str,
    *,
    consult_status_events: bool = True,   # NEW parameter, defaults True
) -> RecoveryState:
    """Scan for in-progress lane workspaces and orphan claims.

    When consult_status_events=True (default after FR-021), the function:
      1. Reads kitty-specs/<mission>/status.events.jsonl
      2. Materializes the lane snapshot for every WP
      3. For WPs whose lane is `done` and whose branches are absent, marks
         them as `merged_and_deleted` rather than `missing`
      4. For downstream WPs whose dependencies are all `done`, returns
         "ready to start from target branch tip"

    Backwards compatibility: callers that pass `consult_status_events=False`
    get the legacy live-branch-only behavior, which still exists for any
    in-progress mission where branches haven't been deleted yet.
    """

New `RecoveryState` field

RecoveryState (existing dataclass in recovery.py) gains a new field:

@dataclass(frozen=True)
class RecoveryState:
    # ... existing fields ...
    ready_to_start_from_target: list[str] = field(default_factory=list)
    """List of WP IDs whose dependencies are all `done` and whose lane
    branches are merged-and-deleted. These WPs can be started fresh from
    the target branch tip via `spec-kitty implement WP## --base main`."""

--base flag for `spec-kitty implement` (FR-021)

File: src/specify_cli/cli/commands/implement.py

import typer
from typing import Optional

@app.command()
def implement(
    wp_id: str,
    base: Optional[str] = typer.Option(
        None,
        "--base",
        help="Explicit base ref for the lane workspace (default: auto-detect)",
    ),
) -> None:
    """Implement a work package, optionally from an explicit base branch."""

    if base is not None:
        # Validate the ref exists locally
        validated_base = _resolve_git_ref(base, repo_root)
        # Create the lane workspace from the explicit base
        workspace = create_lane_workspace_from_base(wp_id, validated_base, repo_root)
    else:
        # Existing auto-detect path (unchanged)
        workspace = create_lane_workspace(wp_id, repo_root)

    # ... rest of implement logic unchanged ...

Validation:

--base accepts any local git ref (branch name, tag, commit SHA, HEAD).
If the ref does not resolve, the command fails with a clear error pointing at git fetch and git branch -a.
When omitted, the existing auto-detect logic runs unchanged.

Verification report (FR-016)

File: kitty-specs/068-post-merge-reliability-and-release-hardening/wp05-verification-report.md

# WP05 Recovery Verification Report

**Authored**: <date>
**Validated against**: commit `<sha>`

## Coverage

This report accounts for every documented failure shape from issues #415 and #416, including the two pre-identified gaps from the Mission 067 Failure-Mode Evidence sections.

## Pre-identified gap 1 (#416 status-events loss)

**Failure shape**: `_run_lane_based_merge` writes `done` events to disk but never commits them. External merge rebuild discards them.

**Status**: `fixed_by_this_mission`

**Evidence**:
- Fix landed in WP02 via FR-019 (`safe_commit` call between mark-done loop and worktree-removal)
- Regression test: `tests/cli/commands/test_merge_status_commit.py::test_done_events_committed_to_git` (FR-020)
- Verified by reading `git show HEAD:kitty-specs/<mission>/status.events.jsonl` after a synthetic merge

## Pre-identified gap 2 (#415 post-merge recovery deadlock)

**Failure shape**: `scan_recovery_state` ignores merged-and-deleted dependency branches; `implement` does not accept `--base main`.

**Status**: `fixed_by_this_mission`

**Evidence**:
- `scan_recovery_state` extended (FR-021)
- `--base` flag added to `implement` (FR-021)
- Regression tests: `tests/lanes/test_recovery_post_merge.py`, `tests/cli/commands/test_implement_base_flag.py`
- Verified by reproducing the WP07-after-deps-merged scenario from #415 and observing it now succeeds

## Other failure shapes from #415/#416

| Shape | Source | Status | Evidence |
|---|---|---|---|
| (additional shapes added during verification, if any) | | | |

Mission close ledger (FR-018, C-005)

File: kitty-specs/068-post-merge-reliability-and-release-hardening/mission-close-ledger.md

# Mission 068 Close Ledger

**Authored at mission close**: <date>
**Validated DoD**: every issue from the Tracked GitHub Issues table appears below.

## Primary scope (must implement)

| Issue | Decision | Reference | Notes |
|---|---|---|---|
| Priivacy-ai/spec-kitty#454 | closed_with_evidence | <PR/commit link> | WP01 stale-assertion analyzer shipped |
| Priivacy-ai/spec-kitty#456 | closed_with_evidence | <PR/commit link> | WP02 strategy wiring + squash default + push-error parser |
| Priivacy-ai/spec-kitty#455 | closed_with_evidence | <PR/commit link or wp03-validation-report.md> | WP03 validation: <close_with_evidence | tighten_workflow> |
| Priivacy-ai/spec-kitty#457 | closed_with_evidence | <PR/commit link> | WP04 release-prep CLI; FR-023 scope-cut documented |

## Verification-and-close scope

| Issue | Decision | Reference | Notes |
|---|---|---|---|
| Priivacy-ai/spec-kitty#415 | closed_with_evidence | <PR/commit link> | FR-021 fix landed (scan_recovery_state + --base) |
| Priivacy-ai/spec-kitty#416 | closed_with_evidence | <PR/commit link> | FR-019/FR-020 fix landed in WP02; verified by WP05 |

## Carve-outs filed as follow-ups

| Original concern | Follow-up issue | Notes |
|---|---|---|
| FSEvents debounce / `_worktree_removal_delay()` empirical timing | <new issue link> | Carved out per spec Assumptions section |
| Dirty classifier `git check-ignore` consultation | <new issue link> | Filed per spec Out-of-Scope; `--force` workaround documented |

Test surface

Test	FR	Asserts
`test_scan_recovery_state_finds_merged_deleted_deps`	FR-021	a synthetic mission with WP01–WP05 done-and-deleted lets WP06 be marked ready
`test_implement_base_flag_creates_workspace_from_ref`	FR-021	`spec-kitty implement WP06 --base main` creates a worktree at the main branch tip
`test_implement_base_flag_invalid_ref_fails_clearly`	FR-021	`--base bogus-ref` fails with a clear error pointing at remediation
`test_post_merge_unblocking_scenario_end_to_end`	FR-021, Scenario 7	full Scenario 7: WP01–WP05 merged, WP06 starts cleanly without manual state edits
`test_verification_report_authored_at_mission_close`	FR-016	`wp05-verification-report.md` exists with all required sections
`test_mission_close_ledger_complete`	FR-018, DoD-4	every issue in the Tracked GitHub Issues table has exactly one ledger row

NFR coverage

NFR-005: all FR-021 tests run without network access (uses local synthetic git repos via fixture)
NFR-006: mypy --strict passes on the new function signatures and dataclass field

release_prep.md

Contract: WP04 Release Prep CLI

Owns: FR-013, FR-014, FR-015, FR-023 + NFR-004

CLI surface

Command path: spec-kitty agent release prep

File: src/specify_cli/cli/commands/agent/release.py (currently a stub; WP04 populates it)

"""Release packaging commands for AI agents."""
import typer
from pathlib import Path
from rich.console import Console
from specify_cli.release.payload import build_release_prep_payload
from specify_cli.release.payload import ReleasePrepPayload, ReleaseChannel

app = typer.Typer(
    name="release",
    help="Release packaging commands for AI agents",
    no_args_is_help=True,
)
console = Console()

@app.command("prep")
def prep(
    channel: ReleaseChannel = typer.Option(..., "--channel", help="Release channel: alpha | beta | stable"),
    repo_root: Path = typer.Option(Path("."), "--repo", help="Repository root"),
    json_output: bool = typer.Option(False, "--json", help="Emit JSON instead of human-readable text"),
) -> None:
    """Prepare release artifacts (changelog draft, version bump, structured inputs)."""
    payload = build_release_prep_payload(
        channel=channel,
        repo_root=repo_root.resolve(),
    )
    if json_output:
        import json as _json
        from dataclasses import asdict
        console.print_json(_json.dumps(asdict(payload)))
    else:
        # Rich rendering: version diff, changelog block, mission slug list, structured inputs table
        ...

Internal package

Module tree: src/specify_cli/release/ (locked decision — not optional)

src/specify_cli/release/
├── __init__.py
├── changelog.py    # build_changelog_block(missions: list[Path]) -> str
├── version.py      # propose_version(current: str, channel: ReleaseChannel) -> str
└── payload.py      # build_release_prep_payload(channel, repo_root) -> ReleasePrepPayload

The package split is committed at plan time. Three concerns (changelog, version, payload) cleanly map to three modules. The "if it stays small enough, inline it" optimization is rejected as a deferred decision the WP04 implementer would face at code time and resent.

Library functions

# src/specify_cli/release/changelog.py
from pathlib import Path

def build_changelog_block(repo_root: Path, since_tag: str | None = None) -> tuple[str, list[str]]:
    """Build a draft changelog block from kitty-specs/ artifacts.

    Returns:
      (changelog_markdown, mission_slug_list)

    Algorithm:
      1. Find missions in kitty-specs/ accepted since `since_tag` (or since the most recent
         git tag if not specified)
      2. For each mission, read meta.json and spec.md to get title and friendly name
      3. For each mission, walk its tasks/ directory for accepted WP files and extract titles
      4. Render a markdown block grouping missions and their WPs

    No network calls. Uses git locally to determine the previous tag.
    """

# src/specify_cli/release/version.py
from typing import Literal

ReleaseChannel = Literal["alpha", "beta", "stable"]

def propose_version(current: str, channel: ReleaseChannel) -> str:
    """Compute the next version string per channel.

    Examples:
      propose_version("3.1.0a7", "alpha") == "3.1.0a8"
      propose_version("3.1.0a7", "beta")  == "3.1.0b1"
      propose_version("3.1.0a7", "stable") == "3.1.0"
      propose_version("3.1.0", "stable")   == "3.1.1"

    Bump-level rules:
      - alpha: increments alpha number (3.1.0a7 → 3.1.0a8)
      - beta: starts a fresh beta line if current is alpha (3.1.0a7 → 3.1.0b1),
        otherwise increments beta number (3.1.0b1 → 3.1.0b2)
      - stable: drops the prerelease suffix if current is alpha/beta
        (3.1.0a7 → 3.1.0); otherwise **always proposes a patch bump**
        (3.1.0 → 3.1.1)

    Stable→stable always proposes a patch bump. Minor or major bumps
    require manual editing of pyproject.toml before running release prep.
    This matches spec-kitty's actual release cadence (mostly alpha
    increments and patches); a `--bump-level` parameter would be dead
    weight 99% of the time and is intentionally omitted.
    """

# src/specify_cli/release/payload.py
from dataclasses import dataclass
from pathlib import Path
from .version import ReleaseChannel

@dataclass(frozen=True)
class ReleasePrepPayload:
    channel: ReleaseChannel
    current_version: str
    proposed_version: str
    changelog_block: str
    mission_slug_list: list[str]
    target_branch: str
    structured_inputs: dict[str, str]

def build_release_prep_payload(
    channel: ReleaseChannel,
    repo_root: Path,
) -> ReleasePrepPayload:
    """Assemble the full release-prep payload.

    Reads:
      - pyproject.toml for current version
      - kitty-specs/ for missions since previous tag
      - .git for the previous tag

    Returns: a fully-populated ReleasePrepPayload ready to render or serialize.
    Performance: ≤ 5 seconds wall-clock on a mission with up to 16 WPs (NFR-004).
    """

Local-only constraint (FR-014)

Every code path in WP04's package SHALL be testable without network access. Specifically:

build_changelog_block reads kitty-specs/ and git tag --list only.
propose_version reads pyproject.toml only.
build_release_prep_payload orchestrates the above; no GitHub API, no PyPI checks.

Network-touching steps (creating the actual release PR, pushing the tag, monitoring the workflow) are out of scope per FR-023. Maintainers do those steps manually with the structured_inputs payload as a guide.

#457 close-comment scope-cut (FR-023)

When WP04 closes #457, the comment SHALL document exactly:

Automated by this mission:

Changelog draft (via build_changelog_block)
Version bump proposal (via propose_version)
Structured release-prep payload (structured_inputs)
JSON output mode for downstream automation

Still manual:

PR creation (use gh pr create with the changelog block)
Tag push (use git tag -a vX.Y.Z -m "..." && git push origin vX.Y.Z)
Release workflow monitoring (use gh run watch)

If #457's reporter requests automation of the still-manual steps, those SHALL be filed as a follow-up issue.

Test surface

File: tests/cli/commands/agent/test_release_prep.py

Test	FR / NFR	Asserts
`test_prep_command_emits_text_by_default`	FR-013	running `prep` produces a rich-formatted summary
`test_prep_command_emits_json_with_flag`	FR-015	`--json` produces a parseable JSON document with all fields
`test_changelog_built_from_local_artifacts_only`	FR-014	the test runs successfully with no network access (NFR-005)
`test_payload_no_github_api_calls`	FR-014, C-002	a `requests.get`/`urlopen` mock asserts zero network calls
`test_propose_version_alpha_increments_alpha`	FR-013	`3.1.0a7` + alpha → `3.1.0a8`
`test_propose_version_alpha_to_beta_starts_beta1`	FR-013	`3.1.0a7` + beta → `3.1.0b1`
`test_propose_version_alpha_to_stable`	FR-013	`3.1.0a7` + stable → `3.1.0`
`test_runs_within_5s_for_16_wps`	NFR-004	benchmark fails if elapsed > 5s on a synthetic 16-WP mission
`test_close_comment_scope_cut_documented`	FR-023	the rendered output (or a separate close-comment helper) lists automated vs manual steps

stale_assertions.md

Contract: WP01 Stale-Assertion Analyzer

Owns: FR-001, FR-002, FR-003, FR-004, FR-022 + NFR-001, NFR-002

Library entry point

Module: src/specify_cli/post_merge/stale_assertions.py

from pathlib import Path
from specify_cli.post_merge.stale_assertions import (
    run_check,
    StaleAssertionFinding,
    StaleAssertionReport,
)

def run_check(
    base_ref: str,
    head_ref: str,
    repo_root: Path,
) -> StaleAssertionReport:
    """Compare base_ref..head_ref and return likely-stale test assertions.

    Algorithm:
      1. git diff base_ref..head_ref -- '*.py' → list of changed source files + line ranges
      2. For each changed file, parse with ast and extract changed identifiers and string literals
      3. For each test file from `git ls-files 'tests/**/*.py'`, parse with ast
      4. Walk test ASTs for Constant/Name nodes referencing changed identifiers in
         assertion-bearing positions (Compare, Assert, Call(func=Attribute(attr='assert*')))
      5. Emit a StaleAssertionFinding for each match with appropriate confidence
      6. Compute findings_per_100_loc against the changed-line count

    Returns: a StaleAssertionReport with findings list, elapsed_seconds, files_scanned,
             and findings_per_100_loc populated.
    """

Re-exported from: src/specify_cli/post_merge/__init__.py:

from .stale_assertions import (
    run_check,
    StaleAssertionFinding,
    StaleAssertionReport,
)

__all__ = ["run_check", "StaleAssertionFinding", "StaleAssertionReport"]

CLI surface

Command path: spec-kitty agent tests stale-check

Module: src/specify_cli/cli/commands/agent/tests.py

import typer
from pathlib import Path
from rich.console import Console
from specify_cli.post_merge.stale_assertions import run_check

app = typer.Typer(name="tests", help="Test-related commands for AI agents", no_args_is_help=True)
console = Console()

@app.command("stale-check")
def stale_check(
    base: str = typer.Option(..., "--base", help="Base git ref for the diff"),
    head: str = typer.Option("HEAD", "--head", help="Head git ref for the diff"),
    repo_root: Path = typer.Option(Path("."), "--repo", help="Repository root"),
    json_output: bool = typer.Option(False, "--json", help="Emit JSON instead of human-readable text"),
) -> None:
    """Detect test assertions likely invalidated by source changes between two refs."""
    report = run_check(base_ref=base, head_ref=head, repo_root=repo_root.resolve())
    # Render rich text or JSON depending on json_output

Registration: src/specify_cli/cli/commands/agent/__init__.py adds:

from . import tests as tests_module
app.add_typer(tests_module.app, name="tests")

Merge runner integration

File: src/specify_cli/cli/commands/merge.py inside _run_lane_based_merge, after the FR-019 safe_commit step and before the merge summary print.

from specify_cli.post_merge.stale_assertions import run_check

# ... after _mark_wp_merged_done loop and after safe_commit (FR-019):
stale_report = run_check(
    base_ref=merge_base_sha,
    head_ref="HEAD",
    repo_root=repo_root,
)
# Append stale_report findings to the merge summary that's printed to console

Wiring contract (FR-004): the merge runner SHALL invoke run_check via direct library import, NOT by spawning the CLI subcommand as a subprocess. The CLI entry and the merge runner are two thin shims around the same library function.

Confidence assignment rules (FR-003)

Confidence	Condition
`high`	Changed function/class name appears as an `Attribute(attr=...)` or `Name(id=...)` node directly inside an `Assert` test or `assertEqual`/`assertTrue`/etc. call
`medium`	Changed identifier appears in any `Compare` or `Assert` node anywhere in the test file
`low`	Changed string literal matches a `Constant(value="...")` node in an assertion-bearing position

Forbidden: the analyzer SHALL NEVER produce a definitely_stale confidence (FR-003).

Self-monitoring (FR-022)

After every run_check call, the report's findings_per_100_loc is checked against 5.0 (the NFR-002 ceiling). If exceeded:

1. The CLI command emits a warning to stderr. 2. The merge runner emits a warning in the merge summary. 3. WP01's tests SHALL include a benchmark that fails the build if the curated benchmark exceeds the ceiling, forcing WP01 to narrow scope per FR-022.

Test surface

File: tests/post_merge/test_stale_assertions.py

Test	FR / NFR	Asserts
`test_renamed_function_flagged_high_confidence`	FR-001, FR-003	high-confidence finding for a renamed function reference in a test
`test_changed_string_literal_flagged_low_confidence`	FR-001, FR-002	low-confidence finding for a changed literal that matches a `Constant` node
`test_string_literal_in_comment_not_flagged`	FR-002 worked example	comment-only mention of a literal does NOT produce a finding
`test_unchanged_use_of_string_not_flagged`	FR-002 worked example	new use of a literal (without modifying any existing literal) does NOT produce a finding
`test_no_test_suite_load`	FR-002	analyzer does not import or execute any test file as code
`test_no_definitely_stale_confidence`	FR-003	output never contains the literal `"definitely_stale"`
`test_cli_subcommand_invokes_library`	FR-004	the CLI subcommand calls `run_check` and prints its findings
`test_merge_runner_imports_library_directly`	FR-004	the merge runner does NOT use `subprocess` to invoke the CLI
`test_runs_within_30s_on_spec_kitty_core`	NFR-001	benchmark fails the build if elapsed > 30s
`test_fp_ceiling_under_5_per_100_loc`	NFR-002, FR-022	benchmark fails if FP rate > 5/100 LOC
`test_fr_022_fallback_narrows_scope`	FR-022	when the FP ceiling is exceeded, the documented fallback path is exercised

Spec Kitty

Contracts

diff_coverage_policy.md

Contract: WP03 Diff-Coverage Policy Validation

Verification-first protocol

Validation report (FR-010)

FR-011 path: close with evidence

FR-012 path: tighten workflow

Test surface

NFR-006 interaction

merge_strategy.md

Contract: WP02 Merge Strategy + Status-Events Safe Commit

CLI surface (FR-005, FR-006)

Lane→mission semantics (FR-007)

Project config (FR-008)

Push-error parser (FR-009)

Status-events safe_commit fix (FR-019)

Regression test (FR-020)

Test surface

recovery_extension.md

Contract: WP05 Recovery Extension + Verification + Mission Close Ledger

scan_recovery_state extension (FR-021)

Current behavior

New behavior (FR-021)

New RecoveryState field

--base flag for spec-kitty implement (FR-021)

Verification report (FR-016)

Mission close ledger (FR-018, C-005)

Test surface

NFR coverage

release_prep.md

Contract: WP04 Release Prep CLI

CLI surface

Internal package

Library functions

Local-only constraint (FR-014)

#457 close-comment scope-cut (FR-023)

Test surface

stale_assertions.md

Contract: WP01 Stale-Assertion Analyzer

Library entry point

CLI surface

Merge runner integration

Confidence assignment rules (FR-003)

Self-monitoring (FR-022)

Test surface

New `RecoveryState` field

--base flag for `spec-kitty implement` (FR-021)