Implementation Plan: Post-Merge Reliability And Release Hardening

Mission: 068-post-merge-reliability-and-release-hardening Branch: main (planning, base, and merge target — all main) Date: 2026-04-07 Spec: spec.md Plan input: spec.md FR-001..FR-022 + FR-023, NFR-001..NFR-006, C-001..C-006 Validated against: Fresh clone /tmp/spec-kitty-20260407-090957 at commit 7307389a1f529dae9e90279ea972609bb0b420aa


Summary

Final workflow-stabilization mission for spec-kitty core. Five work packages drive the open backlog to zero:

WPScopeOwns FRsIssues
WP01Post-merge stale-assertion analyzer (new src/specify_cli/post_merge/ package + new agent tests CLI group)FR-001..FR-004, FR-022#454
WP02--strategy wiring + squash default + status-events safe_commit fix (all in _run_lane_based_merge)FR-005..FR-009, FR-019, FR-020#456, #416 (fix)
WP03Diff-coverage policy validation + close-or-tightenFR-010..FR-012#455
WP04Release-prep CLI populating the existing agent/release.py stubFR-013..FR-023#457
WP05scan_recovery_state + implement --base main fix (FR-021), verification report, mission-close ledgerFR-016..FR-018, FR-021#415, #416 (verification)

The technical approach is locked by the spec; this plan captures the architectural decisions made during planning interrogation, the file/module layout, the contracts between WPs, and the cross-WP sequencing constraints.


Technical Context

Language/Version: Python 3.11+ (existing spec-kitty requirement) Primary Dependencies:

  • typer — CLI framework (existing)
  • rich — console output (existing)
  • ruamel.yaml — YAML parsing for .kittify/config.yaml (existing)
  • Stdlib ast — Python AST parsing for WP01 stale-assertion analysis (no new dependency)
  • Stdlib subprocess — git invocation (existing pattern in specify_cli.git.commit_helpers)
  • safe_commit from specify_cli.git (existing helper, re-exported from commit_helpers.py:38)
  • specify_cli.lanes.recovery.scan_recovery_state (existing, extended by FR-021)
  • specify_cli.status.emit.emit_status_transition (existing, called by _mark_wp_merged_done)

Storage: Filesystem only (no database). Mission state lives in:

  • kitty-specs/<mission>/status.events.jsonl — append-only event log (canonical lane state)
  • kitty-specs/<mission>/status.json — derived snapshot
  • .kittify/config.yaml — project configuration (merge.strategy key added by WP02)
  • .kittify/runtime/merge/<mission_id>/state.json — ephemeral runtime state (out of scope for the FR-019 fix)

Testing: pytest. All new tests SHALL be added to the existing pytest suite and SHALL run without network access (NFR-005).

Target Platform: Cross-platform CLI (macOS, Linux). FSEvents-specific timing concerns are explicitly out of scope per the Assumptions section.

Project Type: Single Python package (src/specify_cli/). No web frontend, no mobile target.

Performance Goals:

  • WP01 stale-assertion analyzer: ≤ 30 seconds wall-clock on spec-kitty core (~9000+ tests) — NFR-001
  • WP04 release-prep command: ≤ 5 seconds wall-clock on a mission with up to 16 WPs — NFR-004
  • WP02 mission→target merge: 100% success against require_linear_history = true on the integration test matrix — NFR-003

Constraints:

  • mypy --strict clean (NFR-006, charter)
  • ruff clean (charter)
  • Critical-path diff coverage threshold pinned at commit 7307389a (NFR-006, with WP03 carve-out)
  • WP01 analyzer ≤ 5 false-positive findings per 100 LOC of merged change on a curated benchmark (NFR-002)
  • No GitHub API calls (C-002)
  • No re-implementation of existing recovery/merge subsystems (C-003)

Scale/Scope: Five work packages. ~8 new functional surfaces (analyzer module, agent tests CLI group, strategy wiring, safe_commit fix, scan_recovery_state extension, --base support, release-prep command, validation report). Two pre-identified residual gaps to fix; one verification report to author; one mission close ledger to maintain.


Charter Check

GATE STATUS: ✅ PASS (pre-Phase-0)

Charter file: .kittify/charter/charter.md (loaded via spec-kitty charter context --action plan --json)

Charter requirementComplianceNotes
typer as CLI frameworkAll new commands use typer (WP01 agent tests, WP04 agent release prep)
rich for console outputStale-assertion report and release-prep payload both use rich for human output
ruamel.yaml for YAML parsing.kittify/config.yaml merge.strategy key parsed via existing ruamel.yaml infrastructure
pytest with 90%+ test coverage for new codeNFR-006 enforces critical-path coverage threshold; FR-020 and the FR-021 regression test land alongside production code
mypy --strict must passNFR-006
Integration tests for CLI commandsFR-020 (lane merge end-to-end), FR-022 fallback test, FR-021 recovery integration test, FR-023 release-prep CLI test
DIRECTIVE_010 Specification FidelityPlan derives from spec FRs verbatim; no deviations introduced
DIRECTIVE_003 Decision DocumentationArchitectural decisions captured in research.md; cross-WP sequencing captured here and in spec FR-019

Charter Check post-Phase-1 re-evaluation: see end of this document.


Project Structure

Documentation (this feature)

kitty-specs/068-post-merge-reliability-and-release-hardening/
├── spec.md                       # Mission spec (already authored)
├── plan.md                       # This file
├── research.md                   # Phase 0: architectural decisions and current-main analysis
├── data-model.md                 # Phase 1: dataclasses, event shapes, payload schemas
├── quickstart.md                 # Phase 1: maintainer-facing how-to for the new commands and bug fixes
├── contracts/                    # Phase 1: CLI command + library function signatures
│   ├── stale_assertions.md       # WP01 contract
│   ├── merge_strategy.md         # WP02 contract (CLI flag, config schema, library functions)
│   ├── diff_coverage_policy.md   # WP03 contract (validation report shape)
│   ├── release_prep.md           # WP04 contract (CLI command + JSON payload)
│   └── recovery_extension.md     # WP05 contract (scan_recovery_state + --base main)
├── meta.json                     # Mission identity (already authored)
├── checklists/
│   └── requirements.md           # Spec quality checklist
├── tasks/                        # Phase 2: WP files (NOT created by /spec-kitty.plan)
└── mission-close-ledger.md       # Created by WP05 at mission close (per C-005)

Source Code (repository root)

src/specify_cli/
├── post_merge/                            # NEW package (WP01)
│   ├── __init__.py                        # Re-exports run_check, StaleAssertionFinding, StaleAssertionReport
│   └── stale_assertions.py                # AST-based source identifier extraction + AST-based test scan
├── cli/commands/
│   ├── merge.py                           # MODIFIED (WP02): wire --strategy, default to squash, safe_commit fix
│   ├── implement.py                       # MODIFIED (WP05 FR-021): accept --base main flag
│   └── agent/
│       ├── __init__.py                    # MODIFIED: register new `tests` subapp
│       ├── release.py                     # POPULATED (WP04): replace stub with real `prep` command
│       └── tests.py                       # NEW (WP01): `stale-check` subcommand
├── lanes/
│   ├── merge.py                           # MODIFIED (WP02): honor strategy parameter from upper layer
│   └── recovery.py                        # MODIFIED (WP05 FR-021): scan_recovery_state consults status events
├── git/
│   └── commit_helpers.py                  # USED AS-IS (safe_commit imported by WP02)
├── status/
│   ├── emit.py                            # USED AS-IS (emit_status_transition called by mark-done loop)
│   ├── store.py                           # USED AS-IS (append_event)
│   └── reducer.py                         # USED AS-IS (materialize for WP05 verification)
└── release/                               # NEW package (WP04, locked — package split committed at plan time)
    ├── __init__.py
    ├── changelog.py                       # Build draft changelog from mission/WP artifacts
    ├── version.py                         # Version bump per channel
    └── payload.py                         # Build structured release-prep payload

.kittify/
├── config.yaml                            # MODIFIED (WP02): merge.strategy schema added
└── charter/charter.md                     # USED AS-IS

.github/workflows/
└── ci-quality.yml                         # POSSIBLY MODIFIED (WP03, only if validation finds residual gap)

tests/
├── post_merge/
│   └── test_stale_assertions.py           # NEW (WP01): FR-002, FR-022, NFR-001, NFR-002 coverage
├── cli/commands/
│   ├── test_merge_strategy.py             # NEW (WP02 FR-005..FR-009): strategy wiring + push-error parser
│   ├── test_merge_status_commit.py        # NEW (WP02 FR-019, FR-020): events committed to git
│   └── test_implement_base_flag.py        # NEW (WP05 FR-021): --base main flag
├── lanes/
│   └── test_recovery_post_merge.py        # NEW (WP05 FR-021): scan_recovery_state with merged-deleted branches
├── cli/commands/agent/
│   ├── test_release_prep.py               # NEW (WP04 FR-013..FR-023): release-prep CLI + JSON payload
│   └── test_tests_stale_check.py          # NEW (WP01 FR-004): CLI subcommand wires through to library
└── (existing tests untouched unless WP03 changes ci-quality policy)

Structure Decision: This is a single Python project that extends an existing CLI tool. No web/mobile/multi-project structure. New code lands in:

  • src/specify_cli/post_merge/ — new package for WP01
  • src/specify_cli/cli/commands/agent/tests.py — new CLI subgroup for WP01
  • src/specify_cli/release/ — new package for WP04 (package split is locked at plan time; not inlined)
  • Existing files modified: cli/commands/merge.py (WP02), cli/commands/implement.py (WP05), cli/commands/agent/release.py (WP04), cli/commands/agent/__init__.py (WP01 + WP04 registrations), lanes/merge.py (WP02), lanes/recovery.py (WP05)

Cross-WP Sequencing & Dependencies

The lane-planning step that runs after /spec-kitty.tasks will use this dependency graph to compute parallelism. Critical sequencing constraints:

WP01 ────────────────────────► (independent, parallel-safe)
                               new files only: post_merge/, agent/tests.py
                               touches agent/__init__.py for registration

WP02 ────────────────────────► (sequential within itself)
   FR-005/006/007/008 ─► FR-009 ─► FR-019 ─► FR-020
                                    │
                                    └── all edits land in _run_lane_based_merge
                                        in cli/commands/merge.py and lanes/merge.py

WP03 ────────────────────────► (verification-first, low-risk)
   FR-010 (validation report) ─► FR-011 OR FR-012
                                    │
                                    └── only touches .github/workflows/ if FR-012 fires

WP04 ────────────────────────► (independent, parallel-safe)
   touches cli/commands/agent/release.py + agent/__init__.py registration
   reads kitty-specs/ artifacts read-only

WP05 ────────────────────────► (independent of WP02 now that FR-019/020 moved)
   FR-021 ─► FR-016 (verification report) ─► FR-018 (mission close ledger)
   touches lanes/recovery.py + cli/commands/implement.py

Lane-conflict matrix (file-level)

FileTouched by
src/specify_cli/cli/commands/merge.pyWP02 only
src/specify_cli/lanes/merge.pyWP02 only
src/specify_cli/lanes/recovery.pyWP05 only
src/specify_cli/cli/commands/implement.pyWP05 only
src/specify_cli/cli/commands/agent/__init__.pyWP01 + WP04 (both add registrations) — shared edit
src/specify_cli/cli/commands/agent/release.pyWP04 only
src/specify_cli/cli/commands/agent/tests.pyWP01 only (new file)
src/specify_cli/post_merge/WP01 only (new package)
src/specify_cli/release/WP04 only (new package)
.kittify/config.yamlWP02 only
.github/workflows/ci-quality.ymlWP03 only (and only if FR-012 fires)

Only shared edit: agent/__init__.py (WP01 + WP04 both add subapp registrations). Each WP appends a single app.add_typer(...) line at the bottom of the file, registering different subapp names (tests for WP01, release for WP04). This is a textbook trivially-mergeable concatenation conflict — git resolves it without human help. The lane planner MAY place WP01 and WP04 in separate lanes. If both lanes register their subapp under different names (which they do), the merge has zero overlap.

Recommended lane allocation (to be confirmed at /spec-kitty.tasks):

  • Lane A: WP02 (merge command — large, sequential, longest chain)
  • Lane B: WP01 (stale-assertion analyzer — new package, isolated)
  • Lane C: WP04 (release-prep — populates existing stub, isolated)
  • Lane D: WP05 (recovery + implement)
  • Lane E: WP03 (verification-first, low-risk, can run last)

This gives 5 parallel lanes maximum, with the longest sequential chain inside Lane A (WP02's seven FRs — FR-005..FR-009 + FR-019/FR-020 — all touching the same file). The agent/__init__.py concatenation between lanes B and C is auto-resolvable.

If conflict-aversion is preferred over parallelism (e.g., to avoid even auto-resolvable merges), Lanes B and C can collapse into a single Lane B' that runs WP01 then WP04 sequentially. The default recommendation is full parallelism.


Phase 0 Output Pointer

Phase 0 research is delivered as kitty-specs/068-post-merge-reliability-and-release-hardening/research.md.

Phase 0 deliverables:

1. Decision log for the three planning answers (library choice, command surface, library-import wiring) 2. Current-main analysis for the existing modules WP01/WP04 will integrate with (stale_check.py, agent/release.py, commit_helpers.py) 3. Failure-mode reproduction for the FR-019 bug (recovered from session evidence and FROM the spec's Mission 067 Failure-Mode Evidence (A) section) 4. Failure-mode reproduction for the FR-021 bug (scan_recovery_state + --base main, from Mission 067 Failure-Mode Evidence (B)) 5. Library-import wiring rationale — why the merge runner imports run_check directly rather than spawning a subprocess

No [NEEDS CLARIFICATION] markers remain after Phase 0.


Phase 1 Output Pointers

Phase 1 design artifacts:

1. data-model.md — dataclasses for StaleAssertionFinding, StaleAssertionReport, ReleasePrepPayload, MergeStrategy, MergeConfig, RecoveryVerificationEntry, MissionCloseLedgerRow. Plus the canonical shape of the new done event the safe_commit fix persists. 2. contracts/stale_assertions.md — WP01 library + CLI signatures 3. contracts/merge_strategy.md — WP02 CLI flag, config schema, library function signatures, push-error parser token list 4. contracts/diff_coverage_policy.md — WP03 validation report shape 5. contracts/release_prep.md — WP04 CLI command, JSON payload, integration with existing version-bump infrastructure 6. contracts/recovery_extension.md — WP05 scan_recovery_state extension surface, --base main flag, mission-close ledger schema 7. quickstart.md — maintainer-facing walkthrough: run a synthetic merge that exercises FR-019, run release-prep, run the stale-assertion analyzer, exercise the FR-021 post-merge unblocking path

Agent context file update happens at the end of Phase 1 via the existing agent script.


Complexity Tracking

No charter violations. The spec was reviewed three times and all "complexity" the plan inherits is justified by either a tracked GitHub issue or a reproduced 067 failure mode. No items required.

ViolationWhy NeededSimpler Alternative Rejected Because
(none)n/an/a

Post-Phase-1 Charter Re-evaluation

GATE STATUS: ✅ PASS

After Phase 1 artifacts (data-model.md, contracts/*, quickstart.md) landed, the charter check was re-run against the design surface:

Charter requirementPhase 1 verificationStatus
typer as CLI frameworkAll new CLI surfaces (agent tests stale-check, agent release prep, --strategy, --base) use typer parameters in their contract files
rich for console outputStaleAssertionReport and ReleasePrepPayload both render via rich Console; merge runner reuses existing rich infrastructure
ruamel.yaml for YAML parsingMergeConfig reads .kittify/config.yaml's merge.strategy key via the existing ruamel.yaml accessor in specify_cli.config
pytest with 90%+ test coverage for new codeEach contract file lists a test surface table mapping FRs to tests; FR-020 has the explicit regression test pattern; FR-021 has Scenario 7 coverage; FR-022 has its own fallback test
mypy --strict must passAll new dataclasses in data-model.md are fully typed with Literal[...], Path, list[...], tuple[...], Enum — no Any leakage
Integration tests for CLI commandstest_strategy_flag_flows_through, test_done_events_committed_to_git, test_implement_base_flag_creates_workspace_from_ref, test_prep_command_emits_json_with_flag, test_cli_subcommand_invokes_library — all integration-level
DIRECTIVE_010 Specification FidelityEvery contract file references the specific FR(s) it implements; no contract introduces behavior not in the spec
DIRECTIVE_003 Decision Documentationresearch.md captures all three planning decisions with rationale and rejected alternatives; cross-WP sequencing captured in plan.md and spec.md FR-019

No new charter violations introduced by Phase 1 artifacts. Spec ↔ plan ↔ design alignment is verified.

NFR coverage map

NFRThresholdPhase 1 verification location
NFR-001≤ 30s wall clockcontracts/stale_assertions.md test test_runs_within_30s_on_spec_kitty_core
NFR-002≤ 5 FP / 100 LOCcontracts/stale_assertions.md test test_fp_ceiling_under_5_per_100_loc + FR-022 fallback
NFR-003100% success on protected linear-history matrixcontracts/merge_strategy.md test test_protected_linear_history_succeeds_default
NFR-004≤ 5s for 16-WP missionscontracts/release_prep.md test test_runs_within_5s_for_16_wps
NFR-0050 network calls in new testscontracts/release_prep.md test test_payload_no_github_api_calls; charter requirement
NFR-006mypy strict + critical-path coverage at commit 7307389adata-model.md types fully annotated; WP03 carve-out documented in contracts/diff_coverage_policy.md