Contracts
diff_coverage_policy.md
Contract: WP03 Diff-Coverage Policy Validation
Owns: FR-010, FR-011, FR-012
Verification-first protocol
WP03 is verification-first. The order is locked:
1. FR-010 runs first: a written validation report against current ci-quality.yml behavior on a representative large PR sample. No code changes yet. 2. After the validation report, a fork:
- FR-011 fires if validation shows current main already satisfies the policy intent → close #455 with evidence, tighten docs/messages, no workflow change.
- FR-012 fires if validation shows residual mismatch → adjust the workflow so only the intended critical-path surface produces hard failures.
WP03 SHALL NOT modify .github/workflows/ci-quality.yml before authoring the validation report.
Validation report (FR-010)
File: kitty-specs/068-post-merge-reliability-and-release-hardening/wp03-validation-report.md
The report contains the DiffCoverageValidationReport dataclass rendered as markdown:
# WP03 Diff-Coverage Validation Report
**Validated at commit**: `<sha>`
**Workflow path**: `.github/workflows/ci-quality.yml`
**Sample PR**: #<id> — <description>
## Critical-path threshold (enforced)
- Threshold: <X>%
- Current behavior: hard-fails if diff coverage on critical-path files < <X>%
- Critical-path file list source: `<config-path-or-pattern>`
## Full-diff threshold (advisory)
- Threshold: <Y>%
- Current behavior: emits a separate report; never hard-fails
## Findings
- [ ] Critical-path enforce/advisory split is correctly implemented
- [ ] Hard-fail surfaces match the intended critical-path file set
- [ ] Advisory report is clearly labeled as advisory in CI output
- [ ] Large PRs that meet critical-path coverage but miss full-diff coverage pass the build
## Decision
**[ ] close_with_evidence** — current main already satisfies the policy intent. Issue #455 closed with link to this report.
**[ ] tighten_workflow** — residual mismatch found. Workflow adjusted per FR-012.
## Rationale
<one paragraph>
FR-011 path: close with evidence
If the validation report's decision is close_with_evidence:
1. WP03 closes issue #455 with a comment linking to:
2. WP03 tightens documentation and CI output messages so future contributors immediately understand which surface is enforced and which is advisory:
3. No workflow logic changes.
- This validation report
- The current
ci-quality.ymlline ranges that implement the enforce/advisory split - The example large PR that passes correctly under the current policy
- Add a "Diff coverage policy" section to
docs/explanation/(or wherever CI policy is documented) - Update CI step names in
ci-quality.ymlto be self-explanatory (e.g., "diff-coverage (critical-path, enforced)" vs "diff-coverage (full-diff, advisory)")
FR-012 path: tighten workflow
If the validation report's decision is tighten_workflow:
1. WP03 modifies .github/workflows/ci-quality.yml so:
2. WP03 adds an integration test (or a curated synthetic PR test) demonstrating that a large PR meeting critical-path coverage but missing full-diff coverage now passes. 3. WP03 closes issue #455 with a comment linking to the workflow diff and the new test.
- The hard-fail surface is exactly the intended critical-path file set
- Full-diff coverage runs as advisory only
- CI output identifies enforced vs advisory surfaces explicitly
Test surface
| Test | FR | Asserts |
|---|---|---|
test_validation_report_authored | FR-010 | the validation report file exists and has all required sections |
test_decision_is_recorded | FR-010 | the report has exactly one of close_with_evidence or tighten_workflow checked |
test_validation_report_close_path_populated | FR-010, FR-011 | when decision == close_with_evidence, the report has a non-empty rationale (≥ 50 chars) AND either an empty findings list (no policy mismatches) or each finding carries an explicit "satisfied by" rationale. This is the content gate that prevents WP03 from shipping a vacuous report. |
test_close_with_evidence_does_not_modify_workflow | FR-011 | if the decision is close_with_evidence, git diff main -- .github/workflows/ci-quality.yml is empty |
test_tighten_workflow_passes_large_pr_sample | FR-012 | (only if FR-012 fires) a synthetic large PR meeting critical-path but missing full-diff passes |
NFR-006 interaction
NFR-006 is pinned to commit 7307389a. If WP03 takes the FR-012 path and changes the threshold or include-list, NFR-006 is re-evaluated against the post-WP03 threshold rather than blocking WP03's change. This carve-out is documented in NFR-006's body in spec.md.
merge_strategy.md
Contract: WP02 Merge Strategy + Status-Events Safe Commit
Owns: FR-005, FR-006, FR-007, FR-008, FR-009, FR-019, FR-020 + NFR-003
CLI surface (FR-005, FR-006)
File: src/specify_cli/cli/commands/merge.py
The existing --strategy typer parameter is currently declared but discarded before reaching the lane-merge implementation. WP02 wires it through.
import typer
from specify_cli.lanes.merge import run_lane_based_merge
from specify_cli.config import load_merge_config
from specify_cli.lanes.merge import MergeStrategy
@app.command()
def merge(
feature: str = typer.Option(None, "--feature"),
strategy: Optional[MergeStrategy] = typer.Option(
None,
"--strategy",
help="Merge strategy: merge | squash | rebase. Default: squash for mission→target.",
),
resume: bool = typer.Option(False, "--resume"),
abort: bool = typer.Option(False, "--abort"),
dry_run: bool = typer.Option(False, "--dry-run"),
) -> None:
"""Run a feature merge with explicit strategy support."""
# Resolution order: CLI flag > config > default(SQUASH)
resolved_strategy = (
strategy
or load_merge_config(repo_root).strategy
or MergeStrategy.SQUASH
)
run_lane_based_merge(
feature=feature,
repo_root=repo_root,
strategy=resolved_strategy,
...,
)
Key contract: --strategy is no longer silently discarded. The flag value flows from the CLI into _run_lane_based_merge and determines the git command sequence used for the mission→target step.
Lane→mission semantics (FR-007)
Lane→mission merges retain their existing merge-commit behavior. They are local, never hit branch protection, and are valuable as preserved lane structure on the mission branch.
Implementation: the strategy parameter passed to run_lane_based_merge applies ONLY to the final mission→target step. The internal _merge_lane_into_mission helper continues to use git merge (no-fast-forward) regardless of the strategy parameter.
Project config (FR-008)
File: .kittify/config.yaml (existing file, new merge section)
merge:
strategy: squash # one of: merge | squash | rebase
Module: src/specify_cli/config.py (existing) gains a MergeConfig accessor:
from dataclasses import dataclass
from pathlib import Path
from specify_cli.lanes.merge import MergeStrategy
@dataclass
class MergeConfig:
strategy: MergeStrategy | None = None
def load_merge_config(repo_root: Path) -> MergeConfig:
"""Read .kittify/config.yaml and return the merge section."""
...
Validation: if merge.strategy is present but not one of the three allowed values, load_merge_config raises a startup error (not silent fallback).
Push-error parser (FR-009)
Module: src/specify_cli/cli/commands/merge.py (new helper)
LINEAR_HISTORY_REJECTION_TOKENS: tuple[str, ...] = (
"merge commits",
"linear history",
"fast-forward only",
"GH006",
"non-fast-forward",
)
def _is_linear_history_rejection(stderr: str) -> bool:
"""Return True if git push stderr indicates a linear-history rejection."""
haystack = stderr.lower()
return any(token.lower() in haystack for token in LINEAR_HISTORY_REJECTION_TOKENS)
def _emit_remediation_hint(console: Console) -> None:
console.print(
"\n[yellow]Push rejected by linear-history protection.[/yellow]\n"
"Try [cyan]spec-kitty merge --strategy squash[/cyan], or set "
"[cyan]merge.strategy: squash[/cyan] in [cyan].kittify/config.yaml[/cyan].\n"
)
Fail-open rule: if stderr does not match any token, NO hint is emitted. This prevents misleading hints on unrelated push failures.
Backstop role: with squash as the default (FR-006), this parser is a backstop for users who explicitly opt into --strategy merge. It is NOT expected to fire on the default path.
Status-events safe_commit fix (FR-019)
File: src/specify_cli/cli/commands/merge.py inside _run_lane_based_merge
Insertion point: after the per-WP _mark_wp_merged_done loop and before the worktree-removal step.
from specify_cli.git import safe_commit
# ... (existing _mark_wp_merged_done loop) ...
# FR-019: Persist the done events to git so they survive any subsequent
# external merge rebuild (e.g., reset+squash for protected linear-history).
safe_commit(
repo_path=main_repo,
files_to_commit=[
feature_dir / "status.events.jsonl",
feature_dir / "status.json",
],
commit_message=f"chore({mission_slug}): record done transitions for merged WPs",
allow_empty=False,
)
# ... (existing worktree-removal step) ...
Out of scope (per spec "Scope (preempting 'what about MergeState?')" subsection):
.kittify/runtime/merge/<mission_id>/state.json— intentionally ephemeral- The
cleanup_merge_workspace/clear_statecalls at the end — runtime state, not the cause of the loss
Regression test (FR-020)
File: tests/cli/commands/test_merge_status_commit.py
import subprocess
from pathlib import Path
from specify_cli.cli.commands.merge import _run_lane_based_merge
def test_done_events_committed_to_git(synthetic_mission_repo):
"""FR-020: after _run_lane_based_merge returns, the done events for every
merged WP are present in git history at HEAD, not just on disk."""
repo, mission_slug, wps = synthetic_mission_repo # fixture creates 2+ WPs
# Run the merge end-to-end
_run_lane_based_merge(...)
# The proof: read status.events.jsonl from git, not from the working tree
result = subprocess.run(
["git", "show", f"HEAD:kitty-specs/{mission_slug}/status.events.jsonl"],
cwd=repo,
capture_output=True,
text=True,
check=True,
)
events = [json.loads(line) for line in result.stdout.splitlines() if line.strip()]
done_wps = {e["wp_id"] for e in events if e["to_lane"] == "done"}
assert done_wps == set(wps), (
f"Expected done events for every merged WP. "
f"Got {done_wps}, expected {set(wps)}. "
f"This regression would mean FR-019's safe_commit step was missed."
)
This test proves FR-019's contract directly: events are durably committed at the time the merge command returns. It does NOT use git reset --hard HEAD because that's a no-op for a file that's already at HEAD (the previous draft of FR-020 had this logical hole; the simpler direct assertion is mechanically correct).
Test surface
| Test | FR / NFR | Asserts |
|---|---|---|
test_strategy_flag_flows_through | FR-005 | --strategy squash passed to CLI reaches _run_lane_based_merge |
test_default_strategy_is_squash | FR-006 | no flag, no config → squash applied |
test_lane_to_mission_uses_merge_commit | FR-007 | --strategy squash does NOT change lane→mission semantics |
test_config_yaml_strategy_honored | FR-008 | merge.strategy: rebase in config produces a rebase merge |
test_invalid_config_strategy_raises | FR-008 | merge.strategy: bogus raises a startup error, not silent fallback |
test_push_rejection_emits_hint_for_known_tokens | FR-009 | each of the 5 token strings triggers the remediation hint |
test_push_rejection_fails_open_for_unknown | FR-009 | unrelated stderr does NOT emit a hint |
test_done_events_committed_to_git | FR-019, FR-020 | end-to-end FR-020 regression |
test_protected_linear_history_succeeds_default | NFR-003 | squash default succeeds against require_linear_history = true integration test |
recovery_extension.md
Contract: WP05 Recovery Extension + Verification + Mission Close Ledger
Owns: FR-016, FR-017, FR-018, FR-021
scan_recovery_state extension (FR-021)
File: src/specify_cli/lanes/recovery.py (existing function, lines 174-267)
Current behavior
scan_recovery_state(repo_root, mission_slug) iterates branches matching kitty/mission-{slug}* returned by _list_mission_branches. If no live branches exist (because they were merged-and-deleted), the function returns "nothing to recover" — leaving the user with an unblockable workflow.
New behavior (FR-021)
from pathlib import Path
from specify_cli.lanes.recovery import scan_recovery_state, RecoveryState
from specify_cli.status.reducer import materialize
def scan_recovery_state(
repo_root: Path,
mission_slug: str,
*,
consult_status_events: bool = True, # NEW parameter, defaults True
) -> RecoveryState:
"""Scan for in-progress lane workspaces and orphan claims.
When consult_status_events=True (default after FR-021), the function:
1. Reads kitty-specs/<mission>/status.events.jsonl
2. Materializes the lane snapshot for every WP
3. For WPs whose lane is `done` and whose branches are absent, marks
them as `merged_and_deleted` rather than `missing`
4. For downstream WPs whose dependencies are all `done`, returns
"ready to start from target branch tip"
Backwards compatibility: callers that pass `consult_status_events=False`
get the legacy live-branch-only behavior, which still exists for any
in-progress mission where branches haven't been deleted yet.
"""
New RecoveryState field
RecoveryState (existing dataclass in recovery.py) gains a new field:
@dataclass(frozen=True)
class RecoveryState:
# ... existing fields ...
ready_to_start_from_target: list[str] = field(default_factory=list)
"""List of WP IDs whose dependencies are all `done` and whose lane
branches are merged-and-deleted. These WPs can be started fresh from
the target branch tip via `spec-kitty implement WP## --base main`."""
--base flag for spec-kitty implement (FR-021)
File: src/specify_cli/cli/commands/implement.py
import typer
from typing import Optional
@app.command()
def implement(
wp_id: str,
base: Optional[str] = typer.Option(
None,
"--base",
help="Explicit base ref for the lane workspace (default: auto-detect)",
),
) -> None:
"""Implement a work package, optionally from an explicit base branch."""
if base is not None:
# Validate the ref exists locally
validated_base = _resolve_git_ref(base, repo_root)
# Create the lane workspace from the explicit base
workspace = create_lane_workspace_from_base(wp_id, validated_base, repo_root)
else:
# Existing auto-detect path (unchanged)
workspace = create_lane_workspace(wp_id, repo_root)
# ... rest of implement logic unchanged ...
Validation:
--baseaccepts any local git ref (branch name, tag, commit SHA,HEAD).- If the ref does not resolve, the command fails with a clear error pointing at
git fetchandgit branch -a. - When omitted, the existing auto-detect logic runs unchanged.
Verification report (FR-016)
File: kitty-specs/068-post-merge-reliability-and-release-hardening/wp05-verification-report.md
# WP05 Recovery Verification Report
**Authored**: <date>
**Validated against**: commit `<sha>`
## Coverage
This report accounts for every documented failure shape from issues #415 and #416, including the two pre-identified gaps from the Mission 067 Failure-Mode Evidence sections.
## Pre-identified gap 1 (#416 status-events loss)
**Failure shape**: `_run_lane_based_merge` writes `done` events to disk but never commits them. External merge rebuild discards them.
**Status**: `fixed_by_this_mission`
**Evidence**:
- Fix landed in WP02 via FR-019 (`safe_commit` call between mark-done loop and worktree-removal)
- Regression test: `tests/cli/commands/test_merge_status_commit.py::test_done_events_committed_to_git` (FR-020)
- Verified by reading `git show HEAD:kitty-specs/<mission>/status.events.jsonl` after a synthetic merge
## Pre-identified gap 2 (#415 post-merge recovery deadlock)
**Failure shape**: `scan_recovery_state` ignores merged-and-deleted dependency branches; `implement` does not accept `--base main`.
**Status**: `fixed_by_this_mission`
**Evidence**:
- `scan_recovery_state` extended (FR-021)
- `--base` flag added to `implement` (FR-021)
- Regression tests: `tests/lanes/test_recovery_post_merge.py`, `tests/cli/commands/test_implement_base_flag.py`
- Verified by reproducing the WP07-after-deps-merged scenario from #415 and observing it now succeeds
## Other failure shapes from #415/#416
| Shape | Source | Status | Evidence |
|---|---|---|---|
| (additional shapes added during verification, if any) | | | |
Mission close ledger (FR-018, C-005)
File: kitty-specs/068-post-merge-reliability-and-release-hardening/mission-close-ledger.md
# Mission 068 Close Ledger
**Authored at mission close**: <date>
**Validated DoD**: every issue from the Tracked GitHub Issues table appears below.
## Primary scope (must implement)
| Issue | Decision | Reference | Notes |
|---|---|---|---|
| Priivacy-ai/spec-kitty#454 | closed_with_evidence | <PR/commit link> | WP01 stale-assertion analyzer shipped |
| Priivacy-ai/spec-kitty#456 | closed_with_evidence | <PR/commit link> | WP02 strategy wiring + squash default + push-error parser |
| Priivacy-ai/spec-kitty#455 | closed_with_evidence | <PR/commit link or wp03-validation-report.md> | WP03 validation: <close_with_evidence | tighten_workflow> |
| Priivacy-ai/spec-kitty#457 | closed_with_evidence | <PR/commit link> | WP04 release-prep CLI; FR-023 scope-cut documented |
## Verification-and-close scope
| Issue | Decision | Reference | Notes |
|---|---|---|---|
| Priivacy-ai/spec-kitty#415 | closed_with_evidence | <PR/commit link> | FR-021 fix landed (scan_recovery_state + --base) |
| Priivacy-ai/spec-kitty#416 | closed_with_evidence | <PR/commit link> | FR-019/FR-020 fix landed in WP02; verified by WP05 |
## Carve-outs filed as follow-ups
| Original concern | Follow-up issue | Notes |
|---|---|---|
| FSEvents debounce / `_worktree_removal_delay()` empirical timing | <new issue link> | Carved out per spec Assumptions section |
| Dirty classifier `git check-ignore` consultation | <new issue link> | Filed per spec Out-of-Scope; `--force` workaround documented |
Test surface
| Test | FR | Asserts |
|---|---|---|
test_scan_recovery_state_finds_merged_deleted_deps | FR-021 | a synthetic mission with WP01–WP05 done-and-deleted lets WP06 be marked ready |
test_implement_base_flag_creates_workspace_from_ref | FR-021 | spec-kitty implement WP06 --base main creates a worktree at the main branch tip |
test_implement_base_flag_invalid_ref_fails_clearly | FR-021 | --base bogus-ref fails with a clear error pointing at remediation |
test_post_merge_unblocking_scenario_end_to_end | FR-021, Scenario 7 | full Scenario 7: WP01–WP05 merged, WP06 starts cleanly without manual state edits |
test_verification_report_authored_at_mission_close | FR-016 | wp05-verification-report.md exists with all required sections |
test_mission_close_ledger_complete | FR-018, DoD-4 | every issue in the Tracked GitHub Issues table has exactly one ledger row |
NFR coverage
- NFR-005: all FR-021 tests run without network access (uses local synthetic git repos via fixture)
- NFR-006:
mypy --strictpasses on the new function signatures and dataclass field
release_prep.md
Contract: WP04 Release Prep CLI
Owns: FR-013, FR-014, FR-015, FR-023 + NFR-004
CLI surface
Command path: spec-kitty agent release prep
File: src/specify_cli/cli/commands/agent/release.py (currently a stub; WP04 populates it)
"""Release packaging commands for AI agents."""
import typer
from pathlib import Path
from rich.console import Console
from specify_cli.release.payload import build_release_prep_payload
from specify_cli.release.payload import ReleasePrepPayload, ReleaseChannel
app = typer.Typer(
name="release",
help="Release packaging commands for AI agents",
no_args_is_help=True,
)
console = Console()
@app.command("prep")
def prep(
channel: ReleaseChannel = typer.Option(..., "--channel", help="Release channel: alpha | beta | stable"),
repo_root: Path = typer.Option(Path("."), "--repo", help="Repository root"),
json_output: bool = typer.Option(False, "--json", help="Emit JSON instead of human-readable text"),
) -> None:
"""Prepare release artifacts (changelog draft, version bump, structured inputs)."""
payload = build_release_prep_payload(
channel=channel,
repo_root=repo_root.resolve(),
)
if json_output:
import json as _json
from dataclasses import asdict
console.print_json(_json.dumps(asdict(payload)))
else:
# Rich rendering: version diff, changelog block, mission slug list, structured inputs table
...
Internal package
Module tree: src/specify_cli/release/ (locked decision — not optional)
src/specify_cli/release/
├── __init__.py
├── changelog.py # build_changelog_block(missions: list[Path]) -> str
├── version.py # propose_version(current: str, channel: ReleaseChannel) -> str
└── payload.py # build_release_prep_payload(channel, repo_root) -> ReleasePrepPayload
The package split is committed at plan time. Three concerns (changelog, version, payload) cleanly map to three modules. The "if it stays small enough, inline it" optimization is rejected as a deferred decision the WP04 implementer would face at code time and resent.
Library functions
# src/specify_cli/release/changelog.py
from pathlib import Path
def build_changelog_block(repo_root: Path, since_tag: str | None = None) -> tuple[str, list[str]]:
"""Build a draft changelog block from kitty-specs/ artifacts.
Returns:
(changelog_markdown, mission_slug_list)
Algorithm:
1. Find missions in kitty-specs/ accepted since `since_tag` (or since the most recent
git tag if not specified)
2. For each mission, read meta.json and spec.md to get title and friendly name
3. For each mission, walk its tasks/ directory for accepted WP files and extract titles
4. Render a markdown block grouping missions and their WPs
No network calls. Uses git locally to determine the previous tag.
"""
# src/specify_cli/release/version.py
from typing import Literal
ReleaseChannel = Literal["alpha", "beta", "stable"]
def propose_version(current: str, channel: ReleaseChannel) -> str:
"""Compute the next version string per channel.
Examples:
propose_version("3.1.0a7", "alpha") == "3.1.0a8"
propose_version("3.1.0a7", "beta") == "3.1.0b1"
propose_version("3.1.0a7", "stable") == "3.1.0"
propose_version("3.1.0", "stable") == "3.1.1"
Bump-level rules:
- alpha: increments alpha number (3.1.0a7 → 3.1.0a8)
- beta: starts a fresh beta line if current is alpha (3.1.0a7 → 3.1.0b1),
otherwise increments beta number (3.1.0b1 → 3.1.0b2)
- stable: drops the prerelease suffix if current is alpha/beta
(3.1.0a7 → 3.1.0); otherwise **always proposes a patch bump**
(3.1.0 → 3.1.1)
Stable→stable always proposes a patch bump. Minor or major bumps
require manual editing of pyproject.toml before running release prep.
This matches spec-kitty's actual release cadence (mostly alpha
increments and patches); a `--bump-level` parameter would be dead
weight 99% of the time and is intentionally omitted.
"""
# src/specify_cli/release/payload.py
from dataclasses import dataclass
from pathlib import Path
from .version import ReleaseChannel
@dataclass(frozen=True)
class ReleasePrepPayload:
channel: ReleaseChannel
current_version: str
proposed_version: str
changelog_block: str
mission_slug_list: list[str]
target_branch: str
structured_inputs: dict[str, str]
def build_release_prep_payload(
channel: ReleaseChannel,
repo_root: Path,
) -> ReleasePrepPayload:
"""Assemble the full release-prep payload.
Reads:
- pyproject.toml for current version
- kitty-specs/ for missions since previous tag
- .git for the previous tag
Returns: a fully-populated ReleasePrepPayload ready to render or serialize.
Performance: ≤ 5 seconds wall-clock on a mission with up to 16 WPs (NFR-004).
"""
Local-only constraint (FR-014)
Every code path in WP04's package SHALL be testable without network access. Specifically:
build_changelog_blockreadskitty-specs/andgit tag --listonly.propose_versionreadspyproject.tomlonly.build_release_prep_payloadorchestrates the above; no GitHub API, no PyPI checks.
Network-touching steps (creating the actual release PR, pushing the tag, monitoring the workflow) are out of scope per FR-023. Maintainers do those steps manually with the structured_inputs payload as a guide.
#457 close-comment scope-cut (FR-023)
When WP04 closes #457, the comment SHALL document exactly:
Automated by this mission:
- Changelog draft (via
build_changelog_block) - Version bump proposal (via
propose_version) - Structured release-prep payload (
structured_inputs) - JSON output mode for downstream automation
Still manual:
- PR creation (use
gh pr createwith the changelog block) - Tag push (use
git tag -a vX.Y.Z -m "..." && git push origin vX.Y.Z) - Release workflow monitoring (use
gh run watch)
If #457's reporter requests automation of the still-manual steps, those SHALL be filed as a follow-up issue.
Test surface
File: tests/cli/commands/agent/test_release_prep.py
| Test | FR / NFR | Asserts |
|---|---|---|
test_prep_command_emits_text_by_default | FR-013 | running prep produces a rich-formatted summary |
test_prep_command_emits_json_with_flag | FR-015 | --json produces a parseable JSON document with all fields |
test_changelog_built_from_local_artifacts_only | FR-014 | the test runs successfully with no network access (NFR-005) |
test_payload_no_github_api_calls | FR-014, C-002 | a requests.get/urlopen mock asserts zero network calls |
test_propose_version_alpha_increments_alpha | FR-013 | 3.1.0a7 + alpha → 3.1.0a8 |
test_propose_version_alpha_to_beta_starts_beta1 | FR-013 | 3.1.0a7 + beta → 3.1.0b1 |
test_propose_version_alpha_to_stable | FR-013 | 3.1.0a7 + stable → 3.1.0 |
test_runs_within_5s_for_16_wps | NFR-004 | benchmark fails if elapsed > 5s on a synthetic 16-WP mission |
test_close_comment_scope_cut_documented | FR-023 | the rendered output (or a separate close-comment helper) lists automated vs manual steps |
stale_assertions.md
Contract: WP01 Stale-Assertion Analyzer
Owns: FR-001, FR-002, FR-003, FR-004, FR-022 + NFR-001, NFR-002
Library entry point
Module: src/specify_cli/post_merge/stale_assertions.py
from pathlib import Path
from specify_cli.post_merge.stale_assertions import (
run_check,
StaleAssertionFinding,
StaleAssertionReport,
)
def run_check(
base_ref: str,
head_ref: str,
repo_root: Path,
) -> StaleAssertionReport:
"""Compare base_ref..head_ref and return likely-stale test assertions.
Algorithm:
1. git diff base_ref..head_ref -- '*.py' → list of changed source files + line ranges
2. For each changed file, parse with ast and extract changed identifiers and string literals
3. For each test file from `git ls-files 'tests/**/*.py'`, parse with ast
4. Walk test ASTs for Constant/Name nodes referencing changed identifiers in
assertion-bearing positions (Compare, Assert, Call(func=Attribute(attr='assert*')))
5. Emit a StaleAssertionFinding for each match with appropriate confidence
6. Compute findings_per_100_loc against the changed-line count
Returns: a StaleAssertionReport with findings list, elapsed_seconds, files_scanned,
and findings_per_100_loc populated.
"""
Re-exported from: src/specify_cli/post_merge/__init__.py:
from .stale_assertions import (
run_check,
StaleAssertionFinding,
StaleAssertionReport,
)
__all__ = ["run_check", "StaleAssertionFinding", "StaleAssertionReport"]
CLI surface
Command path: spec-kitty agent tests stale-check
Module: src/specify_cli/cli/commands/agent/tests.py
import typer
from pathlib import Path
from rich.console import Console
from specify_cli.post_merge.stale_assertions import run_check
app = typer.Typer(name="tests", help="Test-related commands for AI agents", no_args_is_help=True)
console = Console()
@app.command("stale-check")
def stale_check(
base: str = typer.Option(..., "--base", help="Base git ref for the diff"),
head: str = typer.Option("HEAD", "--head", help="Head git ref for the diff"),
repo_root: Path = typer.Option(Path("."), "--repo", help="Repository root"),
json_output: bool = typer.Option(False, "--json", help="Emit JSON instead of human-readable text"),
) -> None:
"""Detect test assertions likely invalidated by source changes between two refs."""
report = run_check(base_ref=base, head_ref=head, repo_root=repo_root.resolve())
# Render rich text or JSON depending on json_output
Registration: src/specify_cli/cli/commands/agent/__init__.py adds:
from . import tests as tests_module
app.add_typer(tests_module.app, name="tests")
Merge runner integration
File: src/specify_cli/cli/commands/merge.py inside _run_lane_based_merge, after the FR-019 safe_commit step and before the merge summary print.
from specify_cli.post_merge.stale_assertions import run_check
# ... after _mark_wp_merged_done loop and after safe_commit (FR-019):
stale_report = run_check(
base_ref=merge_base_sha,
head_ref="HEAD",
repo_root=repo_root,
)
# Append stale_report findings to the merge summary that's printed to console
Wiring contract (FR-004): the merge runner SHALL invoke run_check via direct library import, NOT by spawning the CLI subcommand as a subprocess. The CLI entry and the merge runner are two thin shims around the same library function.
Confidence assignment rules (FR-003)
| Confidence | Condition |
|---|---|
high | Changed function/class name appears as an Attribute(attr=...) or Name(id=...) node directly inside an Assert test or assertEqual/assertTrue/etc. call |
medium | Changed identifier appears in any Compare or Assert node anywhere in the test file |
low | Changed string literal matches a Constant(value="...") node in an assertion-bearing position |
Forbidden: the analyzer SHALL NEVER produce a definitely_stale confidence (FR-003).
Self-monitoring (FR-022)
After every run_check call, the report's findings_per_100_loc is checked against 5.0 (the NFR-002 ceiling). If exceeded:
1. The CLI command emits a warning to stderr. 2. The merge runner emits a warning in the merge summary. 3. WP01's tests SHALL include a benchmark that fails the build if the curated benchmark exceeds the ceiling, forcing WP01 to narrow scope per FR-022.
Test surface
File: tests/post_merge/test_stale_assertions.py
| Test | FR / NFR | Asserts |
|---|---|---|
test_renamed_function_flagged_high_confidence | FR-001, FR-003 | high-confidence finding for a renamed function reference in a test |
test_changed_string_literal_flagged_low_confidence | FR-001, FR-002 | low-confidence finding for a changed literal that matches a Constant node |
test_string_literal_in_comment_not_flagged | FR-002 worked example | comment-only mention of a literal does NOT produce a finding |
test_unchanged_use_of_string_not_flagged | FR-002 worked example | new use of a literal (without modifying any existing literal) does NOT produce a finding |
test_no_test_suite_load | FR-002 | analyzer does not import or execute any test file as code |
test_no_definitely_stale_confidence | FR-003 | output never contains the literal "definitely_stale" |
test_cli_subcommand_invokes_library | FR-004 | the CLI subcommand calls run_check and prints its findings |
test_merge_runner_imports_library_directly | FR-004 | the merge runner does NOT use subprocess to invoke the CLI |
test_runs_within_30s_on_spec_kitty_core | NFR-001 | benchmark fails the build if elapsed > 30s |
test_fp_ceiling_under_5_per_100_loc | NFR-002, FR-022 | benchmark fails if FP rate > 5/100 LOC |
test_fr_022_fallback_narrows_scope | FR-022 | when the FP ceiling is exceeded, the documented fallback path is exercised |