Tasks: Post-Merge Reliability And Release Hardening

Mission: 068-post-merge-reliability-and-release-hardening Branch: main (planning, base, and merge target — all main) Spec: spec.md Plan: plan.md Generated: 2026-04-07


Branch Strategy

  • Current branch at workflow start: main
  • Planning/base branch for this mission: main
  • Final merge target for completed changes: main
  • Branch matches target: ✅ true

Per spec-kitty agent context resolve: "Current branch at workflow start: main. Planning/base branch for this feature: main. Completed changes must merge into main."

Execution worktrees are allocated per computed lane from lanes.json after finalize-tasks runs. Agents working a WP MUST enter the workspace path printed by spec-kitty implement WP##, not reconstruct paths manually.


Subtask Index

IDDescriptionWPParallel
T001Create src/specify_cli/post_merge/ package with __init__.py and stale_assertions.py skeleton (dataclasses + module docstring)WP01
T002Implement source-side AST extraction: parse changed Python files from git diff base..head, walk ASTs, collect changed function/class names and changed string-literal Constant valuesWP01
T003Implement test-side AST scan: walk every test file from git ls-files 'tests/*/.py', parse with ast, find references to changed identifiers in assertion-bearing positions (Assert, Compare, Call(func=Attribute(attr='assert*')))WP01
T004Implement run_check(base_ref, head_ref, repo_root) -> StaleAssertionReport orchestration: call source extractor, call test scanner, assign confidence (high/medium/low), populate elapsed_seconds/files_scanned/findings_per_100_locWP01
T005Create src/specify_cli/cli/commands/agent/tests.py typer subapp with stale-check --base --head [--json] command; register the new subapp in agent/__init__.pyWP01
T006Author test suite at tests/post_merge/test_stale_assertions.py and tests/cli/commands/agent/test_tests_stale_check.py covering FR-001/002/003/004 worked examples, NFR-001 wall-clock benchmark, NFR-002 FP-ceiling benchmark, FR-022 fallback pathWP01
T007Add --json output mode to the CLI subcommand using dataclasses.asdict so downstream automation can consume the report; verify NFR-005 (no network) by mocking subprocess callsWP01
T008Add MergeStrategy enum and MergeConfig dataclass + load_merge_config(repo_root) -> MergeConfig accessor in new file src/specify_cli/merge/config.py; extend .kittify/config.yaml with the merge.strategy schema; raise startup error on invalid valueWP02
T009Wire --strategy CLI flag through cli/commands/merge.py merge() into _run_lane_based_merge: resolve from CLI flag → config key → squash default; pass resolved strategy down into lanes/merge.pyWP02
T010Modify src/specify_cli/lanes/merge.py to honor the strategy parameter for the mission→target step (squash/rebase/merge); preserve existing merge-commit semantics for the lane→mission stepWP02
T011Add LINEAR_HISTORY_REJECTION_TOKENS tuple, _is_linear_history_rejection(stderr) parser, and _emit_remediation_hint(console) helper to cli/commands/merge.py; wire the parser into the push failure path so rejected pushes trigger the hint (fail-open on unknown messages)WP02
T012Insert safe_commit call in _run_lane_based_merge immediately after the _mark_wp_merged_done loop and before any worktree-removal step (FR-019); commit status.events.jsonl and status.json for the mission feature directory using the existing safe_commit helper from specify_cli.gitWP02
T013Add from specify_cli.post_merge.stale_assertions import run_check to cli/commands/merge.py and call run_check(merge_base_sha, "HEAD", repo_root) after the safe_commit step; render the resulting findings as part of the merge summary block printed to console (depends on WP01)WP02
T014Author test suite at tests/cli/commands/test_merge_strategy.py and tests/cli/commands/test_merge_status_commit.py covering FR-005/006/007/008/009 (strategy wiring + token list + fail-open) AND FR-019/020 (git show HEAD:kitty-specs/<mission>/status.events.jsonl returns to_lane: done for every WP after _run_lane_based_merge returns); add NFR-003 integration test against a require_linear_history = true synthetic remoteWP02
T015Author the WP03 validation report at kitty-specs/068-post-merge-reliability-and-release-hardening/wp03-validation-report.md: walk current .github/workflows/ci-quality.yml against the policy intent of #455, populate DiffCoverageValidationReport fields, write the rationale (≥ 50 chars), record decision as close_with_evidence OR tighten_workflowWP03
T016Execute the FR-011 path only if the report's decision is close_with_evidence: close issue #455 with a comment linking to the validation report and the line ranges that implement the enforce/advisory split; tighten CI step names in ci-quality.yml so they self-document (e.g., "diff-coverage (critical-path, enforced)" vs "diff-coverage (full-diff, advisory)")WP03
T017Execute the FR-012 path only if the report's decision is tighten_workflow: modify .github/workflows/ci-quality.yml so only the intended critical-path surface produces hard failures and full-diff coverage runs as advisory; add an integration/synthetic test demonstrating that a large PR meeting critical-path coverage but missing full-diff passes; close issue #455 with a comment linking to the workflow diff and the new testWP03
T018Author test suite at tests/release/test_diff_coverage_policy.py covering: validation report exists with all sections, exactly one decision recorded, content gate (test_validation_report_close_path_populated: rationale ≥ 50 chars + findings have "satisfied by" rationale), no workflow modification on close path, FR-012 large-PR sample on tighten pathWP03
T019Create src/specify_cli/release/ package with version.py (propose_version(current, channel) -> str per the locked alpha/beta/stable + stable→stable patch rule from contracts/release_prep.md)WP04
T020Implement src/specify_cli/release/changelog.py (build_changelog_block(repo_root, since_tag) -> tuple[str, list[str]]) reading from kitty-specs/ and git tag --list only — no network calls per FR-014WP04
T021Implement src/specify_cli/release/payload.py (build_release_prep_payload(channel, repo_root) -> ReleasePrepPayload orchestration + ReleasePrepPayload dataclass per data-model.md)WP04
T022Populate src/specify_cli/cli/commands/agent/release.py stub: replace the placeholder typer.Typer with a real prep subcommand accepting --channel {alpha,beta,stable} [--json]; render text mode via rich, JSON mode via dataclasses.asdict; update the stale "Deep implementation in WP05" commentWP04
T023Author test suite at tests/release/test_release_prep.py covering FR-013/014/015 (text + JSON modes), FR-023 #457 close-comment scope-cut helper, FR-014 zero-network-calls assertion (mock requests/urlopen), propose_version per channel including stable→stable patch rule, NFR-004 5-second benchmark on 16-WP synthetic missionWP04
T024Extend scan_recovery_state in src/specify_cli/lanes/recovery.py to consult mission status events: read kitty-specs/<mission>/status.events.jsonl, materialize lane snapshots, mark merged-and-deleted WPs correctly; add RecoveryState.ready_to_start_from_target: list[str] field; add consult_status_events: bool = True keyword parameter (FR-021)WP05
T025Add --base <ref> CLI flag to src/specify_cli/cli/commands/implement.py: validate ref resolves locally, create lane workspace from explicit base via git worktree add --branch <new> <path> <base>; preserve existing auto-detect path when flag omitted (FR-021)WP05
T026Author test suite at tests/lanes/test_recovery_post_merge.py and tests/cli/commands/test_implement_base_flag.py covering Scenario 7 end-to-end (synthetic test mission with placeholder upstream work packages done-and-deleted and a downstream work package starting via --base main), scan_recovery_state finds merged-deleted deps, --base bogus-ref fails with clear error, post-merge unblocking integrationWP05
T027Author the WP05 verification report at kitty-specs/068-post-merge-reliability-and-release-hardening/wp05-verification-report.md accounting for both pre-identified gaps (#416 status-events, #415 recovery deadlock) AND any additional shapes discovered during verification; status: fixed_by_this_mission or residual_gap per row (FR-016, FR-017)WP05
T028Author the Mission Close Ledger at kitty-specs/068-post-merge-reliability-and-release-hardening/mission-close-ledger.md with one row per issue in the Tracked GitHub Issues table (#454, #455, #456, #457, #415, #416) plus carve-out rows for the FSEvents debounce follow-up and the dirty-classifier follow-up; mechanically checkable per DoD-4 (FR-018, C-005)WP05

Total: 28 subtasks across 5 work packages.


Phase 1 — Setup

(No setup WPs required. The mission extends existing modules and the dev environment is already configured.)


Phase 2 — Foundational

(No foundational WPs required. Each story WP is independently implementable except for WP02's library-import dependency on WP01.)


Phase 3 — Story WPs

WP01 — Stale-Assertion Analyzer

Goal: Build the post-merge stale-assertion analyzer (#454) — a stdlib ast-based tool that flags test assertions likely invalidated by merged source changes. Library + CLI subcommand. Wired into the merge runner via direct library import. Priority: P1 (primary scope) Estimated prompt size: ~480 lines (7 subtasks × ~65 lines each) Independent test: pytest tests/post_merge tests/cli/commands/agent/test_tests_stale_check.py exits zero with NFR-001 wall-clock and NFR-002 FP-ceiling benchmarks both green. Dependencies: none

Included subtasks:

  • □ T001 Create src/specify_cli/post_merge/ package skeleton (WP01)
  • □ T002 Implement source-side AST identifier/literal extraction (WP01)
  • □ T003 Implement test-side AST scan in assertion-bearing positions (WP01)
  • □ T004 Implement run_check() orchestration + confidence assignment (WP01)
  • □ T005 New agent tests subapp + stale-check command + register in agent/__init__.py (WP01)
  • □ T006 Test suite for FR-001/002/003/004 + NFR-001/002 + FR-022 fallback (WP01)
  • □ T007 --json output mode + NFR-005 zero-network assertion (WP01)

Implementation sketch: Create the new post_merge/ package, write stale_assertions.py top-down (dataclasses → source extractor → test scanner → orchestrator), then add the CLI shim, then the test suite, then the JSON output. Avoid all subprocess use except for git diff and git ls-files invocations needed to enumerate the diff and the test file list.

Parallel opportunities: The internal subtasks are mostly sequential within the WP because each layer builds on the previous. T006 (tests) can be drafted in parallel with T002/T003 if you write the test scaffolding first.

Risks: NFR-002's FP ceiling (≤ 5 per 100 LOC of merged change) is tight. If the AST-based heuristic exceeds it, FR-022 mandates narrowing scope (e.g., literal-string changes only). Document the narrowed scope as a new constraint row in spec.md before requesting review.

Prompt file: tasks/WP01-stale-assertion-analyzer.md


WP02 — Merge Strategy + Status-Events Safe Commit

Goal: Wire --strategy end-to-end through the merge command (#456), default to squash for the mission→target step, add the push-error remediation parser, and fix the FR-019 status-events loss bug by inserting safe_commit after the mark-done loop (#416 known residual gap). Also wires the WP01 stale-assertion analyzer call into the merge runner. Priority: P1 (primary scope, largest sequential chain) Estimated prompt size: ~520 lines (7 subtasks × ~75 lines each — slightly over target because the FRs all touch the same function and need cohesive guidance) Independent test: pytest tests/cli/commands/test_merge_strategy.py tests/cli/commands/test_merge_status_commit.py exits zero. The FR-020 regression test asserts git show HEAD:kitty-specs/<mission>/status.events.jsonl contains a to_lane: done entry for every merged WP after _run_lane_based_merge returns. Dependencies: WP01 (T013 imports run_check from specify_cli.post_merge.stale_assertions)

Included subtasks:

  • □ T008 MergeStrategy enum + MergeConfig + load_merge_config in merge/config.py + .kittify/config.yaml schema (WP02)
  • □ T009 Wire --strategy CLI flag through to _run_lane_based_merge (WP02)
  • □ T010 Honor strategy parameter in lanes/merge.py mission→target step; preserve lane→mission merge commits (WP02)
  • □ T011 Push-error parser tokens + remediation hint helper + wire to push failure path (WP02)
  • □ T012 Insert safe_commit between mark-done loop and worktree-removal (FR-019) (WP02)
  • □ T013 Import + call run_check from WP01 in the merge runner; render findings in merge summary (WP02)
  • □ T014 Test suite covering FR-005..FR-009, FR-019, FR-020, NFR-003 (WP02)

Implementation sketch: Land in dependency order. (1) merge/config.py first — pure dataclass file with no side effects. (2) Wire the CLI parameter and the resolution order in cli/commands/merge.py. (3) Push the resolved strategy down into lanes/merge.py and switch on it for the mission→target step. (4) Add the push-error parser helpers as private functions in cli/commands/merge.py. (5) Insert the safe_commit call at the documented insertion point. (6) Add the run_check import and call. (7) Tests last so they exercise the full plumbing.

Parallel opportunities: T008 is fully independent and can land first. T011 is independent of T009/T010 but lives in the same file so should be sequenced. T013 depends on WP01 landing, but can be drafted as a stub that imports cleanly.

Risks: The FR-019 fix is the most important single change in the mission. The regression test (T014) MUST use git show HEAD: directly — do NOT use git reset --hard HEAD (proves nothing per the contracts/merge_strategy.md note). Lane→mission merges MUST keep merge-commit semantics — do not collapse them under the strategy switch.

Prompt file: tasks/WP02-merge-strategy-and-safe-commit.md


WP03 — Diff-Coverage Policy Validation And Closure

Goal: Validate current ci-quality.yml against the policy intent of #455. If validation shows current main already satisfies the intent, close #455 with evidence and tighten doc/CI messages (FR-011). Otherwise, modify the workflow so only critical-path coverage hard-fails and full-diff coverage stays advisory (FR-012). Verification-first: no workflow edits before the validation report exists. Priority: P1 (primary scope) Estimated prompt size: ~280 lines (4 subtasks × ~70 lines each) Independent test: pytest tests/release/test_diff_coverage_policy.py exits zero AND the validation report file exists with all required sections, exactly one decision recorded, and a non-empty rationale. Dependencies: none

Included subtasks:

  • ✅ T015 Author the WP03 validation report walking current ci-quality.yml (WP03)
  • ✅ T016 FR-011 path: close #455 with evidence + tighten CI step names (only if decision == close_with_evidence) (WP03)
  • ✅ T017 FR-012 path: modify ci-quality.yml + add large-PR test (only if decision == tighten_workflow) (WP03)
  • ✅ T018 Test suite covering report content gate, decision recording, and conditional FR-011/FR-012 paths (WP03)

Implementation sketch: Read ci-quality.yml end-to-end first. Identify which step enforces critical-path and which emits the advisory full-diff report. Run a representative large PR (or a synthetic equivalent) through it locally. Record findings in the validation report. Decide. Execute either the FR-011 path (no workflow change, comment + step rename) OR the FR-012 path (workflow change + test + comment). Then write the test suite that validates whichever path you took.

Parallel opportunities: T015 must come first. T016 and T017 are mutually exclusive. T018 can be drafted in parallel with T015 (test scaffolding) but must finalize after the decision.

Risks: Skipping straight to "make it advisory" without the validation step is forbidden (FR-010). The content gate test (test_validation_report_close_path_populated) prevents shipping a vacuous report — the rationale must be ≥ 50 characters and findings must carry "satisfied by" justifications.

Prompt file: tasks/WP03-diff-coverage-policy-validation.md


WP04 — Release-Prep CLI

Goal: Populate the existing agent/release.py stub with a real prep subcommand (#457). Build the changelog draft from local kitty-specs/ artifacts (no network), propose a version bump per channel (alpha/beta/stable with stable→stable patch rule), and emit both rich text and JSON. Document the scope-cut in the #457 close comment (automated steps vs still-manual steps). Priority: P1 (primary scope) Estimated prompt size: ~350 lines (5 subtasks × ~70 lines each) Independent test: pytest tests/release/test_release_prep.py exits zero with the NFR-004 5-second benchmark green AND spec-kitty agent release prep --channel alpha --json | jq .proposed_version returns the expected next-version string. Dependencies: none

Included subtasks:

  • ✅ T019 Create src/specify_cli/release/version.py with propose_version per locked rules (WP04)
  • ✅ T020 Implement release/changelog.py build_changelog_block reading kitty-specs/ + git tags only (WP04)
  • ✅ T021 Implement release/payload.py build_release_prep_payload orchestration + ReleasePrepPayload dataclass (WP04)
  • ✅ T022 Populate agent/release.py stub with prep subcommand (text + JSON modes); update stale comment (WP04)
  • ✅ T023 Test suite covering FR-013/014/015/015a + NFR-004 benchmark + zero-network assertion (WP04)

Implementation sketch: Build the package bottom-up. (1) version.py is pure functions over version strings — start here, fully tested. (2) changelog.py reads filesystem + invokes git tag --list — testable with synthetic kitty-specs/ fixture. (3) payload.py orchestrates the above. (4) agent/release.py becomes a thin typer shim. (5) Tests lock the contract. Network mocking is essential — FR-014 forbids any GitHub API call.

Parallel opportunities: T019/T020/T021 are independent files and can be drafted in parallel. T022 depends on all three. T023 lands last.

Risks: The agent/release.py stub is currently a registered live subapp at agent/__init__.py:20 — DO NOT delete the registration or move the file. Just populate the existing stub. The locked decision is to use the release/ package split (NOT inline into agent/release.py) — no second-guessing at code time.

Prompt file: tasks/WP04-release-prep-cli.md


WP05 — Recovery Extension + Verification + Mission Close Ledger

Goal: Fix the #415 known residual gap by extending scan_recovery_state to consult mission status events and adding --base <ref> to spec-kitty implement (FR-021). Author the WP05 verification report covering the two pre-identified gaps and any additional shapes (FR-016/017). Author the Mission Close Ledger with one row per tracked issue (FR-018, C-005). This is the WP that closes the workflow-stabilization track. Priority: P1 (primary scope, also delivers DoD-4) Estimated prompt size: ~390 lines (5 subtasks × ~78 lines each) Independent test: pytest tests/lanes/test_recovery_post_merge.py tests/cli/commands/test_implement_base_flag.py exits zero AND mission-close-ledger.md contains one row for every issue in the Tracked GitHub Issues table. Dependencies: none (verification can be authored against any branch state; FR-021 fix is independent of all other WPs)

Included subtasks:

  • □ T024 Extend scan_recovery_state to consult status events; add ready_to_start_from_target field (WP05)
  • □ T025 Add --base <ref> CLI flag to spec-kitty implement (WP05)
  • □ T026 Test suite for Scenario 7 end-to-end + recovery scanner with merged-deleted deps (WP05)
  • □ T027 Author wp05-verification-report.md accounting for both pre-identified gaps + any additional shapes (WP05)
  • □ T028 Author mission-close-ledger.md with one row per tracked issue + carve-out rows (WP05)

Implementation sketch: T024 + T025 are the code changes; do them first, then T026 tests them. T027 verification report can be drafted in parallel with the code changes since the failure shapes are already documented in spec.md Mission 067 Failure-Mode Evidence sections. T028 ledger is filled in last because some rows need merge commit / PR links from the other WPs.

Parallel opportunities: T024 and T025 are independent files (recovery.py vs implement.py) and can be parallelized within the WP. T027 and T028 are markdown authoring and don't block the code changes.

Risks: WP05 owns the mechanically-checkable DoD-4. If any tracked issue is missing from the ledger, the mission cannot close. Use the spec's Tracked GitHub Issues table as the authoritative checklist.

Prompt file: tasks/WP05-recovery-and-mission-close.md


Phase 4 — Polish

(No polish WPs required. Verification, documentation tightening, and ledger authorship are folded into WP03 and WP05.)


Dependency Graph

WP01 (no deps) ────► WP02 (uses WP01's run_check library)
WP03 (no deps)  ──► (independent)
WP04 (no deps)  ──► (independent)
WP05 (no deps)  ──► (independent)

The lane planner SHOULD give WP01 → WP02 the longest sequential lane (Lane A). WP03, WP04, WP05 are independent and can run in parallel lanes.

Recommended lane allocation (lane planner will compute the canonical version from lanes.json):

  • Lane A: WP01 → WP02 (sequential because of run_check import dependency)
  • Lane B: WP03 (verification-first, low-risk)
  • Lane C: WP04 (release-prep, isolated)
  • Lane D: WP05 (recovery + ledger, isolated)

4 parallel lanes maximum, with WP01→WP02 as the longest serial chain.


MVP Scope Recommendation

MVP = WP02 alone (the FR-019 status-events safe_commit fix). It's the most urgent residual gap from mission 067 and stands on its own — every subsequent merge of multi-WP missions on this repo or any other consuming spec-kitty will hit the loss-of-events failure mode again until WP02 lands. Everything else in 068 can ship later if needed.

If you want a more complete MVP, MVP = WP01 + WP02 (the analyzer + the merge fix) gives you the two highest-impact code changes in one ship.


Coverage Summary (FR → WP)

FROwning WPSubtask(s)
FR-001WP01T001, T002, T003, T004, T006
FR-002WP01T002, T003, T006
FR-003WP01T004, T006
FR-004WP01T005, T006, T007
FR-005WP02T009
FR-006WP02T009
FR-007WP02T010
FR-008WP02T008
FR-009WP02T011
FR-010WP03T015
FR-011WP03T015, T016
FR-012WP03T015, T017
FR-013WP04T019, T021, T022
FR-014WP04T020, T021, T023
FR-015WP04T022, T023
FR-023WP04T023
FR-016WP05T027
FR-017WP05T024, T025, T027
FR-018WP05T028
FR-019WP02T012
FR-020WP02T014
FR-021WP05T024, T025, T026
FR-022WP01T006
NFR-001WP01T006
NFR-002WP01T006
NFR-003WP02T014
NFR-004WP04T023
NFR-005WP01, WP02, WP04T007, T014, T023
NFR-006all(charter gate)