Contracts

diff_coverage_policy.md

Contract: WP03 Diff-Coverage Policy Validation

Owns: FR-010, FR-011, FR-012

Verification-first protocol

WP03 is verification-first. The order is locked:

1. FR-010 runs first: a written validation report against current ci-quality.yml behavior on a representative large PR sample. No code changes yet. 2. After the validation report, a fork:

  • FR-011 fires if validation shows current main already satisfies the policy intent → close #455 with evidence, tighten docs/messages, no workflow change.
  • FR-012 fires if validation shows residual mismatch → adjust the workflow so only the intended critical-path surface produces hard failures.

WP03 SHALL NOT modify .github/workflows/ci-quality.yml before authoring the validation report.

Validation report (FR-010)

File: kitty-specs/068-post-merge-reliability-and-release-hardening/wp03-validation-report.md

The report contains the DiffCoverageValidationReport dataclass rendered as markdown:

# WP03 Diff-Coverage Validation Report

**Validated at commit**: `<sha>`
**Workflow path**: `.github/workflows/ci-quality.yml`
**Sample PR**: #<id> — <description>

## Critical-path threshold (enforced)
- Threshold: <X>%
- Current behavior: hard-fails if diff coverage on critical-path files < <X>%
- Critical-path file list source: `<config-path-or-pattern>`

## Full-diff threshold (advisory)
- Threshold: <Y>%
- Current behavior: emits a separate report; never hard-fails

## Findings
- [ ] Critical-path enforce/advisory split is correctly implemented
- [ ] Hard-fail surfaces match the intended critical-path file set
- [ ] Advisory report is clearly labeled as advisory in CI output
- [ ] Large PRs that meet critical-path coverage but miss full-diff coverage pass the build

## Decision
**[ ] close_with_evidence** — current main already satisfies the policy intent. Issue #455 closed with link to this report.
**[ ] tighten_workflow** — residual mismatch found. Workflow adjusted per FR-012.

## Rationale
<one paragraph>

FR-011 path: close with evidence

If the validation report's decision is close_with_evidence:

1. WP03 closes issue #455 with a comment linking to:

2. WP03 tightens documentation and CI output messages so future contributors immediately understand which surface is enforced and which is advisory:

3. No workflow logic changes.

  • This validation report
  • The current ci-quality.yml line ranges that implement the enforce/advisory split
  • The example large PR that passes correctly under the current policy
  • Add a "Diff coverage policy" section to docs/explanation/ (or wherever CI policy is documented)
  • Update CI step names in ci-quality.yml to be self-explanatory (e.g., "diff-coverage (critical-path, enforced)" vs "diff-coverage (full-diff, advisory)")

FR-012 path: tighten workflow

If the validation report's decision is tighten_workflow:

1. WP03 modifies .github/workflows/ci-quality.yml so:

2. WP03 adds an integration test (or a curated synthetic PR test) demonstrating that a large PR meeting critical-path coverage but missing full-diff coverage now passes. 3. WP03 closes issue #455 with a comment linking to the workflow diff and the new test.

  • The hard-fail surface is exactly the intended critical-path file set
  • Full-diff coverage runs as advisory only
  • CI output identifies enforced vs advisory surfaces explicitly

Test surface

TestFRAsserts
test_validation_report_authoredFR-010the validation report file exists and has all required sections
test_decision_is_recordedFR-010the report has exactly one of close_with_evidence or tighten_workflow checked
test_validation_report_close_path_populatedFR-010, FR-011when decision == close_with_evidence, the report has a non-empty rationale (≥ 50 chars) AND either an empty findings list (no policy mismatches) or each finding carries an explicit "satisfied by" rationale. This is the content gate that prevents WP03 from shipping a vacuous report.
test_close_with_evidence_does_not_modify_workflowFR-011if the decision is close_with_evidence, git diff main -- .github/workflows/ci-quality.yml is empty
test_tighten_workflow_passes_large_pr_sampleFR-012(only if FR-012 fires) a synthetic large PR meeting critical-path but missing full-diff passes

NFR-006 interaction

NFR-006 is pinned to commit 7307389a. If WP03 takes the FR-012 path and changes the threshold or include-list, NFR-006 is re-evaluated against the post-WP03 threshold rather than blocking WP03's change. This carve-out is documented in NFR-006's body in spec.md.

merge_strategy.md

Contract: WP02 Merge Strategy + Status-Events Safe Commit

Owns: FR-005, FR-006, FR-007, FR-008, FR-009, FR-019, FR-020 + NFR-003

CLI surface (FR-005, FR-006)

File: src/specify_cli/cli/commands/merge.py

The existing --strategy typer parameter is currently declared but discarded before reaching the lane-merge implementation. WP02 wires it through.

import typer
from specify_cli.lanes.merge import run_lane_based_merge
from specify_cli.config import load_merge_config
from specify_cli.lanes.merge import MergeStrategy

@app.command()
def merge(
    feature: str = typer.Option(None, "--feature"),
    strategy: Optional[MergeStrategy] = typer.Option(
        None,
        "--strategy",
        help="Merge strategy: merge | squash | rebase. Default: squash for mission→target.",
    ),
    resume: bool = typer.Option(False, "--resume"),
    abort: bool = typer.Option(False, "--abort"),
    dry_run: bool = typer.Option(False, "--dry-run"),
) -> None:
    """Run a feature merge with explicit strategy support."""

    # Resolution order: CLI flag > config > default(SQUASH)
    resolved_strategy = (
        strategy
        or load_merge_config(repo_root).strategy
        or MergeStrategy.SQUASH
    )

    run_lane_based_merge(
        feature=feature,
        repo_root=repo_root,
        strategy=resolved_strategy,
        ...,
    )

Key contract: --strategy is no longer silently discarded. The flag value flows from the CLI into _run_lane_based_merge and determines the git command sequence used for the mission→target step.

Lane→mission semantics (FR-007)

Lane→mission merges retain their existing merge-commit behavior. They are local, never hit branch protection, and are valuable as preserved lane structure on the mission branch.

Implementation: the strategy parameter passed to run_lane_based_merge applies ONLY to the final mission→target step. The internal _merge_lane_into_mission helper continues to use git merge (no-fast-forward) regardless of the strategy parameter.

Project config (FR-008)

File: .kittify/config.yaml (existing file, new merge section)

merge:
  strategy: squash    # one of: merge | squash | rebase

Module: src/specify_cli/config.py (existing) gains a MergeConfig accessor:

from dataclasses import dataclass
from pathlib import Path
from specify_cli.lanes.merge import MergeStrategy

@dataclass
class MergeConfig:
    strategy: MergeStrategy | None = None

def load_merge_config(repo_root: Path) -> MergeConfig:
    """Read .kittify/config.yaml and return the merge section."""
    ...

Validation: if merge.strategy is present but not one of the three allowed values, load_merge_config raises a startup error (not silent fallback).

Push-error parser (FR-009)

Module: src/specify_cli/cli/commands/merge.py (new helper)

LINEAR_HISTORY_REJECTION_TOKENS: tuple[str, ...] = (
    "merge commits",
    "linear history",
    "fast-forward only",
    "GH006",
    "non-fast-forward",
)

def _is_linear_history_rejection(stderr: str) -> bool:
    """Return True if git push stderr indicates a linear-history rejection."""
    haystack = stderr.lower()
    return any(token.lower() in haystack for token in LINEAR_HISTORY_REJECTION_TOKENS)

def _emit_remediation_hint(console: Console) -> None:
    console.print(
        "\n[yellow]Push rejected by linear-history protection.[/yellow]\n"
        "Try [cyan]spec-kitty merge --strategy squash[/cyan], or set "
        "[cyan]merge.strategy: squash[/cyan] in [cyan].kittify/config.yaml[/cyan].\n"
    )

Fail-open rule: if stderr does not match any token, NO hint is emitted. This prevents misleading hints on unrelated push failures.

Backstop role: with squash as the default (FR-006), this parser is a backstop for users who explicitly opt into --strategy merge. It is NOT expected to fire on the default path.

Status-events safe_commit fix (FR-019)

File: src/specify_cli/cli/commands/merge.py inside _run_lane_based_merge

Insertion point: after the per-WP _mark_wp_merged_done loop and before the worktree-removal step.

from specify_cli.git import safe_commit

# ... (existing _mark_wp_merged_done loop) ...

# FR-019: Persist the done events to git so they survive any subsequent
# external merge rebuild (e.g., reset+squash for protected linear-history).
safe_commit(
    repo_path=main_repo,
    files_to_commit=[
        feature_dir / "status.events.jsonl",
        feature_dir / "status.json",
    ],
    commit_message=f"chore({mission_slug}): record done transitions for merged WPs",
    allow_empty=False,
)

# ... (existing worktree-removal step) ...

Out of scope (per spec "Scope (preempting 'what about MergeState?')" subsection):

  • .kittify/runtime/merge/<mission_id>/state.json — intentionally ephemeral
  • The cleanup_merge_workspace/clear_state calls at the end — runtime state, not the cause of the loss

Regression test (FR-020)

File: tests/cli/commands/test_merge_status_commit.py

import subprocess
from pathlib import Path
from specify_cli.cli.commands.merge import _run_lane_based_merge

def test_done_events_committed_to_git(synthetic_mission_repo):
    """FR-020: after _run_lane_based_merge returns, the done events for every
    merged WP are present in git history at HEAD, not just on disk."""
    repo, mission_slug, wps = synthetic_mission_repo  # fixture creates 2+ WPs

    # Run the merge end-to-end
    _run_lane_based_merge(...)

    # The proof: read status.events.jsonl from git, not from the working tree
    result = subprocess.run(
        ["git", "show", f"HEAD:kitty-specs/{mission_slug}/status.events.jsonl"],
        cwd=repo,
        capture_output=True,
        text=True,
        check=True,
    )
    events = [json.loads(line) for line in result.stdout.splitlines() if line.strip()]
    done_wps = {e["wp_id"] for e in events if e["to_lane"] == "done"}

    assert done_wps == set(wps), (
        f"Expected done events for every merged WP. "
        f"Got {done_wps}, expected {set(wps)}. "
        f"This regression would mean FR-019's safe_commit step was missed."
    )

This test proves FR-019's contract directly: events are durably committed at the time the merge command returns. It does NOT use git reset --hard HEAD because that's a no-op for a file that's already at HEAD (the previous draft of FR-020 had this logical hole; the simpler direct assertion is mechanically correct).

Test surface

TestFR / NFRAsserts
test_strategy_flag_flows_throughFR-005--strategy squash passed to CLI reaches _run_lane_based_merge
test_default_strategy_is_squashFR-006no flag, no config → squash applied
test_lane_to_mission_uses_merge_commitFR-007--strategy squash does NOT change lane→mission semantics
test_config_yaml_strategy_honoredFR-008merge.strategy: rebase in config produces a rebase merge
test_invalid_config_strategy_raisesFR-008merge.strategy: bogus raises a startup error, not silent fallback
test_push_rejection_emits_hint_for_known_tokensFR-009each of the 5 token strings triggers the remediation hint
test_push_rejection_fails_open_for_unknownFR-009unrelated stderr does NOT emit a hint
test_done_events_committed_to_gitFR-019, FR-020end-to-end FR-020 regression
test_protected_linear_history_succeeds_defaultNFR-003squash default succeeds against require_linear_history = true integration test

recovery_extension.md

Contract: WP05 Recovery Extension + Verification + Mission Close Ledger

Owns: FR-016, FR-017, FR-018, FR-021

scan_recovery_state extension (FR-021)

File: src/specify_cli/lanes/recovery.py (existing function, lines 174-267)

Current behavior

scan_recovery_state(repo_root, mission_slug) iterates branches matching kitty/mission-{slug}* returned by _list_mission_branches. If no live branches exist (because they were merged-and-deleted), the function returns "nothing to recover" — leaving the user with an unblockable workflow.

New behavior (FR-021)

from pathlib import Path
from specify_cli.lanes.recovery import scan_recovery_state, RecoveryState
from specify_cli.status.reducer import materialize

def scan_recovery_state(
    repo_root: Path,
    mission_slug: str,
    *,
    consult_status_events: bool = True,   # NEW parameter, defaults True
) -> RecoveryState:
    """Scan for in-progress lane workspaces and orphan claims.

    When consult_status_events=True (default after FR-021), the function:
      1. Reads kitty-specs/<mission>/status.events.jsonl
      2. Materializes the lane snapshot for every WP
      3. For WPs whose lane is `done` and whose branches are absent, marks
         them as `merged_and_deleted` rather than `missing`
      4. For downstream WPs whose dependencies are all `done`, returns
         "ready to start from target branch tip"

    Backwards compatibility: callers that pass `consult_status_events=False`
    get the legacy live-branch-only behavior, which still exists for any
    in-progress mission where branches haven't been deleted yet.
    """

New RecoveryState field

RecoveryState (existing dataclass in recovery.py) gains a new field:

@dataclass(frozen=True)
class RecoveryState:
    # ... existing fields ...
    ready_to_start_from_target: list[str] = field(default_factory=list)
    """List of WP IDs whose dependencies are all `done` and whose lane
    branches are merged-and-deleted. These WPs can be started fresh from
    the target branch tip via `spec-kitty implement WP## --base main`."""

--base flag for spec-kitty implement (FR-021)

File: src/specify_cli/cli/commands/implement.py

import typer
from typing import Optional

@app.command()
def implement(
    wp_id: str,
    base: Optional[str] = typer.Option(
        None,
        "--base",
        help="Explicit base ref for the lane workspace (default: auto-detect)",
    ),
) -> None:
    """Implement a work package, optionally from an explicit base branch."""

    if base is not None:
        # Validate the ref exists locally
        validated_base = _resolve_git_ref(base, repo_root)
        # Create the lane workspace from the explicit base
        workspace = create_lane_workspace_from_base(wp_id, validated_base, repo_root)
    else:
        # Existing auto-detect path (unchanged)
        workspace = create_lane_workspace(wp_id, repo_root)

    # ... rest of implement logic unchanged ...

Validation:

  • --base accepts any local git ref (branch name, tag, commit SHA, HEAD).
  • If the ref does not resolve, the command fails with a clear error pointing at git fetch and git branch -a.
  • When omitted, the existing auto-detect logic runs unchanged.

Verification report (FR-016)

File: kitty-specs/068-post-merge-reliability-and-release-hardening/wp05-verification-report.md

# WP05 Recovery Verification Report

**Authored**: <date>
**Validated against**: commit `<sha>`

## Coverage

This report accounts for every documented failure shape from issues #415 and #416, including the two pre-identified gaps from the Mission 067 Failure-Mode Evidence sections.

## Pre-identified gap 1 (#416 status-events loss)

**Failure shape**: `_run_lane_based_merge` writes `done` events to disk but never commits them. External merge rebuild discards them.

**Status**: `fixed_by_this_mission`

**Evidence**:
- Fix landed in WP02 via FR-019 (`safe_commit` call between mark-done loop and worktree-removal)
- Regression test: `tests/cli/commands/test_merge_status_commit.py::test_done_events_committed_to_git` (FR-020)
- Verified by reading `git show HEAD:kitty-specs/<mission>/status.events.jsonl` after a synthetic merge

## Pre-identified gap 2 (#415 post-merge recovery deadlock)

**Failure shape**: `scan_recovery_state` ignores merged-and-deleted dependency branches; `implement` does not accept `--base main`.

**Status**: `fixed_by_this_mission`

**Evidence**:
- `scan_recovery_state` extended (FR-021)
- `--base` flag added to `implement` (FR-021)
- Regression tests: `tests/lanes/test_recovery_post_merge.py`, `tests/cli/commands/test_implement_base_flag.py`
- Verified by reproducing the WP07-after-deps-merged scenario from #415 and observing it now succeeds

## Other failure shapes from #415/#416

| Shape | Source | Status | Evidence |
|---|---|---|---|
| (additional shapes added during verification, if any) | | | |

Mission close ledger (FR-018, C-005)

File: kitty-specs/068-post-merge-reliability-and-release-hardening/mission-close-ledger.md

# Mission 068 Close Ledger

**Authored at mission close**: <date>
**Validated DoD**: every issue from the Tracked GitHub Issues table appears below.

## Primary scope (must implement)

| Issue | Decision | Reference | Notes |
|---|---|---|---|
| Priivacy-ai/spec-kitty#454 | closed_with_evidence | <PR/commit link> | WP01 stale-assertion analyzer shipped |
| Priivacy-ai/spec-kitty#456 | closed_with_evidence | <PR/commit link> | WP02 strategy wiring + squash default + push-error parser |
| Priivacy-ai/spec-kitty#455 | closed_with_evidence | <PR/commit link or wp03-validation-report.md> | WP03 validation: <close_with_evidence | tighten_workflow> |
| Priivacy-ai/spec-kitty#457 | closed_with_evidence | <PR/commit link> | WP04 release-prep CLI; FR-023 scope-cut documented |

## Verification-and-close scope

| Issue | Decision | Reference | Notes |
|---|---|---|---|
| Priivacy-ai/spec-kitty#415 | closed_with_evidence | <PR/commit link> | FR-021 fix landed (scan_recovery_state + --base) |
| Priivacy-ai/spec-kitty#416 | closed_with_evidence | <PR/commit link> | FR-019/FR-020 fix landed in WP02; verified by WP05 |

## Carve-outs filed as follow-ups

| Original concern | Follow-up issue | Notes |
|---|---|---|
| FSEvents debounce / `_worktree_removal_delay()` empirical timing | <new issue link> | Carved out per spec Assumptions section |
| Dirty classifier `git check-ignore` consultation | <new issue link> | Filed per spec Out-of-Scope; `--force` workaround documented |

Test surface

TestFRAsserts
test_scan_recovery_state_finds_merged_deleted_depsFR-021a synthetic mission with WP01–WP05 done-and-deleted lets WP06 be marked ready
test_implement_base_flag_creates_workspace_from_refFR-021spec-kitty implement WP06 --base main creates a worktree at the main branch tip
test_implement_base_flag_invalid_ref_fails_clearlyFR-021--base bogus-ref fails with a clear error pointing at remediation
test_post_merge_unblocking_scenario_end_to_endFR-021, Scenario 7full Scenario 7: WP01–WP05 merged, WP06 starts cleanly without manual state edits
test_verification_report_authored_at_mission_closeFR-016wp05-verification-report.md exists with all required sections
test_mission_close_ledger_completeFR-018, DoD-4every issue in the Tracked GitHub Issues table has exactly one ledger row

NFR coverage

  • NFR-005: all FR-021 tests run without network access (uses local synthetic git repos via fixture)
  • NFR-006: mypy --strict passes on the new function signatures and dataclass field

release_prep.md

Contract: WP04 Release Prep CLI

Owns: FR-013, FR-014, FR-015, FR-023 + NFR-004

CLI surface

Command path: spec-kitty agent release prep

File: src/specify_cli/cli/commands/agent/release.py (currently a stub; WP04 populates it)

"""Release packaging commands for AI agents."""
import typer
from pathlib import Path
from rich.console import Console
from specify_cli.release.payload import build_release_prep_payload
from specify_cli.release.payload import ReleasePrepPayload, ReleaseChannel

app = typer.Typer(
    name="release",
    help="Release packaging commands for AI agents",
    no_args_is_help=True,
)
console = Console()

@app.command("prep")
def prep(
    channel: ReleaseChannel = typer.Option(..., "--channel", help="Release channel: alpha | beta | stable"),
    repo_root: Path = typer.Option(Path("."), "--repo", help="Repository root"),
    json_output: bool = typer.Option(False, "--json", help="Emit JSON instead of human-readable text"),
) -> None:
    """Prepare release artifacts (changelog draft, version bump, structured inputs)."""
    payload = build_release_prep_payload(
        channel=channel,
        repo_root=repo_root.resolve(),
    )
    if json_output:
        import json as _json
        from dataclasses import asdict
        console.print_json(_json.dumps(asdict(payload)))
    else:
        # Rich rendering: version diff, changelog block, mission slug list, structured inputs table
        ...

Internal package

Module tree: src/specify_cli/release/ (locked decision — not optional)

src/specify_cli/release/
├── __init__.py
├── changelog.py    # build_changelog_block(missions: list[Path]) -> str
├── version.py      # propose_version(current: str, channel: ReleaseChannel) -> str
└── payload.py      # build_release_prep_payload(channel, repo_root) -> ReleasePrepPayload

The package split is committed at plan time. Three concerns (changelog, version, payload) cleanly map to three modules. The "if it stays small enough, inline it" optimization is rejected as a deferred decision the WP04 implementer would face at code time and resent.

Library functions

# src/specify_cli/release/changelog.py
from pathlib import Path

def build_changelog_block(repo_root: Path, since_tag: str | None = None) -> tuple[str, list[str]]:
    """Build a draft changelog block from kitty-specs/ artifacts.

    Returns:
      (changelog_markdown, mission_slug_list)

    Algorithm:
      1. Find missions in kitty-specs/ accepted since `since_tag` (or since the most recent
         git tag if not specified)
      2. For each mission, read meta.json and spec.md to get title and friendly name
      3. For each mission, walk its tasks/ directory for accepted WP files and extract titles
      4. Render a markdown block grouping missions and their WPs

    No network calls. Uses git locally to determine the previous tag.
    """
# src/specify_cli/release/version.py
from typing import Literal

ReleaseChannel = Literal["alpha", "beta", "stable"]

def propose_version(current: str, channel: ReleaseChannel) -> str:
    """Compute the next version string per channel.

    Examples:
      propose_version("3.1.0a7", "alpha") == "3.1.0a8"
      propose_version("3.1.0a7", "beta")  == "3.1.0b1"
      propose_version("3.1.0a7", "stable") == "3.1.0"
      propose_version("3.1.0", "stable")   == "3.1.1"

    Bump-level rules:
      - alpha: increments alpha number (3.1.0a7 → 3.1.0a8)
      - beta: starts a fresh beta line if current is alpha (3.1.0a7 → 3.1.0b1),
        otherwise increments beta number (3.1.0b1 → 3.1.0b2)
      - stable: drops the prerelease suffix if current is alpha/beta
        (3.1.0a7 → 3.1.0); otherwise **always proposes a patch bump**
        (3.1.0 → 3.1.1)

    Stable→stable always proposes a patch bump. Minor or major bumps
    require manual editing of pyproject.toml before running release prep.
    This matches spec-kitty's actual release cadence (mostly alpha
    increments and patches); a `--bump-level` parameter would be dead
    weight 99% of the time and is intentionally omitted.
    """
# src/specify_cli/release/payload.py
from dataclasses import dataclass
from pathlib import Path
from .version import ReleaseChannel

@dataclass(frozen=True)
class ReleasePrepPayload:
    channel: ReleaseChannel
    current_version: str
    proposed_version: str
    changelog_block: str
    mission_slug_list: list[str]
    target_branch: str
    structured_inputs: dict[str, str]

def build_release_prep_payload(
    channel: ReleaseChannel,
    repo_root: Path,
) -> ReleasePrepPayload:
    """Assemble the full release-prep payload.

    Reads:
      - pyproject.toml for current version
      - kitty-specs/ for missions since previous tag
      - .git for the previous tag

    Returns: a fully-populated ReleasePrepPayload ready to render or serialize.
    Performance: ≤ 5 seconds wall-clock on a mission with up to 16 WPs (NFR-004).
    """

Local-only constraint (FR-014)

Every code path in WP04's package SHALL be testable without network access. Specifically:

  • build_changelog_block reads kitty-specs/ and git tag --list only.
  • propose_version reads pyproject.toml only.
  • build_release_prep_payload orchestrates the above; no GitHub API, no PyPI checks.

Network-touching steps (creating the actual release PR, pushing the tag, monitoring the workflow) are out of scope per FR-023. Maintainers do those steps manually with the structured_inputs payload as a guide.

#457 close-comment scope-cut (FR-023)

When WP04 closes #457, the comment SHALL document exactly:

Automated by this mission:

  • Changelog draft (via build_changelog_block)
  • Version bump proposal (via propose_version)
  • Structured release-prep payload (structured_inputs)
  • JSON output mode for downstream automation

Still manual:

  • PR creation (use gh pr create with the changelog block)
  • Tag push (use git tag -a vX.Y.Z -m "..." && git push origin vX.Y.Z)
  • Release workflow monitoring (use gh run watch)

If #457's reporter requests automation of the still-manual steps, those SHALL be filed as a follow-up issue.

Test surface

File: tests/cli/commands/agent/test_release_prep.py

TestFR / NFRAsserts
test_prep_command_emits_text_by_defaultFR-013running prep produces a rich-formatted summary
test_prep_command_emits_json_with_flagFR-015--json produces a parseable JSON document with all fields
test_changelog_built_from_local_artifacts_onlyFR-014the test runs successfully with no network access (NFR-005)
test_payload_no_github_api_callsFR-014, C-002a requests.get/urlopen mock asserts zero network calls
test_propose_version_alpha_increments_alphaFR-0133.1.0a7 + alpha → 3.1.0a8
test_propose_version_alpha_to_beta_starts_beta1FR-0133.1.0a7 + beta → 3.1.0b1
test_propose_version_alpha_to_stableFR-0133.1.0a7 + stable → 3.1.0
test_runs_within_5s_for_16_wpsNFR-004benchmark fails if elapsed > 5s on a synthetic 16-WP mission
test_close_comment_scope_cut_documentedFR-023the rendered output (or a separate close-comment helper) lists automated vs manual steps

stale_assertions.md

Contract: WP01 Stale-Assertion Analyzer

Owns: FR-001, FR-002, FR-003, FR-004, FR-022 + NFR-001, NFR-002

Library entry point

Module: src/specify_cli/post_merge/stale_assertions.py

from pathlib import Path
from specify_cli.post_merge.stale_assertions import (
    run_check,
    StaleAssertionFinding,
    StaleAssertionReport,
)

def run_check(
    base_ref: str,
    head_ref: str,
    repo_root: Path,
) -> StaleAssertionReport:
    """Compare base_ref..head_ref and return likely-stale test assertions.

    Algorithm:
      1. git diff base_ref..head_ref -- '*.py' → list of changed source files + line ranges
      2. For each changed file, parse with ast and extract changed identifiers and string literals
      3. For each test file from `git ls-files 'tests/**/*.py'`, parse with ast
      4. Walk test ASTs for Constant/Name nodes referencing changed identifiers in
         assertion-bearing positions (Compare, Assert, Call(func=Attribute(attr='assert*')))
      5. Emit a StaleAssertionFinding for each match with appropriate confidence
      6. Compute findings_per_100_loc against the changed-line count

    Returns: a StaleAssertionReport with findings list, elapsed_seconds, files_scanned,
             and findings_per_100_loc populated.
    """

Re-exported from: src/specify_cli/post_merge/__init__.py:

from .stale_assertions import (
    run_check,
    StaleAssertionFinding,
    StaleAssertionReport,
)

__all__ = ["run_check", "StaleAssertionFinding", "StaleAssertionReport"]

CLI surface

Command path: spec-kitty agent tests stale-check

Module: src/specify_cli/cli/commands/agent/tests.py

import typer
from pathlib import Path
from rich.console import Console
from specify_cli.post_merge.stale_assertions import run_check

app = typer.Typer(name="tests", help="Test-related commands for AI agents", no_args_is_help=True)
console = Console()

@app.command("stale-check")
def stale_check(
    base: str = typer.Option(..., "--base", help="Base git ref for the diff"),
    head: str = typer.Option("HEAD", "--head", help="Head git ref for the diff"),
    repo_root: Path = typer.Option(Path("."), "--repo", help="Repository root"),
    json_output: bool = typer.Option(False, "--json", help="Emit JSON instead of human-readable text"),
) -> None:
    """Detect test assertions likely invalidated by source changes between two refs."""
    report = run_check(base_ref=base, head_ref=head, repo_root=repo_root.resolve())
    # Render rich text or JSON depending on json_output

Registration: src/specify_cli/cli/commands/agent/__init__.py adds:

from . import tests as tests_module
app.add_typer(tests_module.app, name="tests")

Merge runner integration

File: src/specify_cli/cli/commands/merge.py inside _run_lane_based_merge, after the FR-019 safe_commit step and before the merge summary print.

from specify_cli.post_merge.stale_assertions import run_check

# ... after _mark_wp_merged_done loop and after safe_commit (FR-019):
stale_report = run_check(
    base_ref=merge_base_sha,
    head_ref="HEAD",
    repo_root=repo_root,
)
# Append stale_report findings to the merge summary that's printed to console

Wiring contract (FR-004): the merge runner SHALL invoke run_check via direct library import, NOT by spawning the CLI subcommand as a subprocess. The CLI entry and the merge runner are two thin shims around the same library function.

Confidence assignment rules (FR-003)

ConfidenceCondition
highChanged function/class name appears as an Attribute(attr=...) or Name(id=...) node directly inside an Assert test or assertEqual/assertTrue/etc. call
mediumChanged identifier appears in any Compare or Assert node anywhere in the test file
lowChanged string literal matches a Constant(value="...") node in an assertion-bearing position

Forbidden: the analyzer SHALL NEVER produce a definitely_stale confidence (FR-003).

Self-monitoring (FR-022)

After every run_check call, the report's findings_per_100_loc is checked against 5.0 (the NFR-002 ceiling). If exceeded:

1. The CLI command emits a warning to stderr. 2. The merge runner emits a warning in the merge summary. 3. WP01's tests SHALL include a benchmark that fails the build if the curated benchmark exceeds the ceiling, forcing WP01 to narrow scope per FR-022.

Test surface

File: tests/post_merge/test_stale_assertions.py

TestFR / NFRAsserts
test_renamed_function_flagged_high_confidenceFR-001, FR-003high-confidence finding for a renamed function reference in a test
test_changed_string_literal_flagged_low_confidenceFR-001, FR-002low-confidence finding for a changed literal that matches a Constant node
test_string_literal_in_comment_not_flaggedFR-002 worked examplecomment-only mention of a literal does NOT produce a finding
test_unchanged_use_of_string_not_flaggedFR-002 worked examplenew use of a literal (without modifying any existing literal) does NOT produce a finding
test_no_test_suite_loadFR-002analyzer does not import or execute any test file as code
test_no_definitely_stale_confidenceFR-003output never contains the literal "definitely_stale"
test_cli_subcommand_invokes_libraryFR-004the CLI subcommand calls run_check and prints its findings
test_merge_runner_imports_library_directlyFR-004the merge runner does NOT use subprocess to invoke the CLI
test_runs_within_30s_on_spec_kitty_coreNFR-001benchmark fails the build if elapsed > 30s
test_fp_ceiling_under_5_per_100_locNFR-002, FR-022benchmark fails if FP rate > 5/100 LOC
test_fr_022_fallback_narrows_scopeFR-022when the FP ceiling is exceeded, the documented fallback path is exercised