Implementation Plan: Stabilization Release: Core Bug Fixes
Branch: main | Date: 2026-04-21 | Spec: spec.md Mission ID: 01KPQJAN4P2V4MTHRFGS7VW17M Input: kitty-specs/stabilization-release-core-bug-fixes-01KPQJAN/spec.md
Summary
Fix four validated bugs on current main with targeted, in-place changes to six existing modules. No new packages introduced. Each work package ships its own regression tests. The WPs are ordered by user impact: merge invariant false-positives affect all multi-agent users immediately; Gemini shim breakage blocks Gemini integrations entirely; review-lane misstate poisons the event log; and the intake cluster hardens against edge cases users are less likely to hit during normal use.
Execution strategy: Single lane, serial execution (WP01 → WP02 → WP03 → WP04). Each WP must pass the full test suite before the next begins.
Technical Context
Language/Version: Python 3.11+ Primary Dependencies: typer (CLI), rich (console output), ruamel.yaml (YAML parsing) — no new dependencies introduced Storage: Filesystem only (YAML/JSON/JSONL/Markdown files, no database) Testing: pytest with 90%+ coverage requirement; mypy --strict must pass; integration tests for CLI commands Target Platform: Linux/macOS (POSIX path semantics assumed; atomic Path.replace() used for write safety) Project Type: Single Python CLI project Performance Goals: No latency-sensitive changes; correctness only Constraints: Backward compatibility with existing event logs and intake workflows must be preserved; no new modules
Charter Check
Charter source: .kittify/charter/charter.md (Bootstrap mode)
| Check | Status | Notes |
|---|---|---|
| Uses typer for CLI | ✓ Pass | All CLI entry points remain in existing typer commands |
| Uses rich for console output | ✓ Pass | Error messages use existing console.print() and err_console.print() |
| pytest with 90%+ coverage | ✓ Pass | Each WP includes regression tests covering the fixed path |
| mypy --strict passes | ✓ Pass | No new type annotations required; existing strict config in force |
| Integration tests for CLI commands | ✓ Pass | WP01 and WP04 will include integration-level tests for merge and intake CLI paths |
| DIRECTIVE_003 (decision documentation) | ✓ Pass | Design decisions documented in research.md |
| DIRECTIVE_010 (spec fidelity) | ✓ Pass | Implementation anchored to spec FR/NFR/C requirements |
Post-Phase-1 re-check: All checks still pass. No charter violations.
Project Structure
Documentation (this mission)
kitty-specs/stabilization-release-core-bug-fixes-01KPQJAN/
├── spec.md # Requirements
├── plan.md # This file
├── research.md # Technical design decisions per WP
├── checklists/
│ └── requirements.md
└── tasks.md # Generated by /spec-kitty.tasks
Source Code (change targets)
src/specify_cli/
├── cli/commands/
│ ├── merge.py # WP01: post-merge invariant fix
│ └── agent/
│ └── workflow.py # WP03: review-claim lane transition fix
├── core/
│ └── execution_context.py # WP03: _is_review_claimed duplicate fix
├── shims/
│ └── generator.py # WP02: Gemini/Qwen shim format fix
├── runtime/
│ └── agent_commands.py # WP02: second shim generation path fix
├── mission_brief.py # WP04: resilient write fix
├── cli/commands/intake.py # WP04: file size cap
└── intake_sources.py # WP04: path containment + symlink exclusion
tests/
├── merge/ # WP01 regression tests land here
├── specify_cli/shims/ # WP02 regression tests (create if needed)
├── specify_cli/status/ # WP03 regression tests land here
└── specify_cli/ # WP04 regression tests land here
├── test_mission_brief.py
├── cli/commands/test_intake.py
└── test_intake_sources.py
Implementation Strategy
WP01 — Merge Post-Merge Invariant Fix
Root cause: In src/specify_cli/cli/commands/merge.py around lines 864–884, the invariant iterates porcelain status lines and collects any path not in expected_paths as offending_lines. It does not parse the two-character porcelain status code, so ?? (untracked) lines land in offending_lines the same as M (modified tracked) lines. The error message always blames sparse-checkout regardless of the actual failure type.
Fix approach: 1. Parse the first two characters of each porcelain line. 2. Skip lines where the status code is ?? (untracked, not diverged from HEAD). 3. Continue collecting all other unexpected status codes as genuine divergence. 4. Bifurcate the error message: sparse-checkout guidance only when the failure looks like a tracked-file drop; a generic "unexpected working-tree state" message otherwise.
Test surface: tests/merge/ — add tests simulating ?? lines (must not abort) and M lines (must abort with correct message).
Risks: Narrowing too aggressively. The fix must collect any non-?? unexpected code, not just M . An A (staged new file) or D (deleted) is also unexpected and must still abort.
WP02 — Gemini/Qwen Shim Generation Fix
Root cause: src/specify_cli/core/config.py (AGENT_COMMAND_CONFIG) is the authoritative format registry and already records both Gemini and Qwen as ext: toml, arg_format: {{args}}. Neither generator.py nor agent_commands.py reads this config — both always call generate_shim_content() which returns Markdown. There are two generation paths to fix.
Authoritative TOML schema: Regression baselines in tests/specify_cli/regression/_twelve_agent_baseline/gemini/specify.toml and qwen/specify.toml show the correct flat schema: description = "..." + prompt = """...""". Not [[commands]].
Fix approach: 1. Add generate_shim_content_toml() using the flat description/prompt schema. 2. Add generate_shim_content_for_agent(command, agent_key) routing function that reads AGENT_COMMAND_CONFIG for format and placeholder dispatch. 3. Update generate_all_shims() in generator.py to call the routing function and derive the filename extension from config. 4. Update _sync_agent_commands() in agent_commands.py to call generate_shim_content_for_agent() instead of generate_shim_content() directly.
Files changed: src/specify_cli/shims/generator.py, src/specify_cli/runtime/agent_commands.py
Test surface: tests/specify_cli/shims/test_generator.py — cover TOML validity (tomllib.loads()), flat schema shape, placeholder, non-regression for Claude/Codex, and baseline comparison.
Risks: Any other callers of generate_shim_content() not updated will still produce Markdown for TOML agents. Search for all call sites before closing.
WP03 — Review Lane Semantics Fix
Root cause: In src/specify_cli/cli/commands/agent/workflow.py around lines 1418–1426:
emit_status_transition(TransitionRequest(
...
to_lane=Lane.IN_PROGRESS, # ← BUG: should be Lane.IN_REVIEW
force=True,
review_ref="action-review-claim",
...
))
Additionally, around line 1344, is_review_claimed checks for to_lane == Lane.IN_PROGRESS with review_ref == "action-review-claim" to detect an already-claimed WP — this logic must also be updated.
Fix scope: workflow.py and src/specify_cli/core/execution_context.py. Both files contain _is_review_claimed logic that checks only the legacy IN_PROGRESS + review_ref shape.
Fix approach: 1. Change to_lane=Lane.IN_PROGRESS to to_lane=Lane.IN_REVIEW in the review-claim emit in workflow.py. 2. Remove force=True — for_review → in_review is a legal transition. 3. Update is_review_claimed in workflow.py to OR the new IN_REVIEW shape with the legacy shape. 4. Update _is_review_claimed() in execution_context.py (line 163) with the same OR condition. 5. Update the lane check at execution_context.py:183 to accept IN_REVIEW as a review-claimed state. 6. Update the entry guard in workflow.py to accept {FOR_REVIEW, IN_REVIEW} + legacy. 7. Rejection path: The transition matrix allows in_review → in_progress (rejection returns to implementation). There is no in_review → for_review transition. Tests must assert to_lane == IN_PROGRESS for rejection.
Test surface: Workflow review tests. Cover: new claim → in_review; approval → approved; rejection → in_progress; historical in_progress + review_ref logs readable.
Risks: Search for action-review-claim across the entire codebase to find all detection sites. Any site not updated will remain on the legacy-only check.
WP04 — Intake Hardening Cluster
Root cause and fix for each sub-issue:
#723 (resilient write): write_mission_brief() in src/specify_cli/mission_brief.py writes brief_path first, then source_path. A crash between the two writes leaves an inconsistent state. Fix: Two-part approach — (1) at the start of every write_mission_brief() call, detect and clean any partial state (brief-without-source or source-without-brief) before writing; (2) use PID-namespaced temp files + Path.replace() for the actual write. This handles all three crash windows: pre-write, between temp writes, and between the two replace calls. True two-file atomicity is not achievable with standard filesystem operations; recovery-at-write-start achieves the spec goal (re-ingest not blocked) without that requirement.
#722 (file size cap): intake.py calls found_path.read_text() without a size guard. Fix: Before any read_text() call on an incoming brief file, check file.stat().st_size against MAX_BRIEF_FILE_SIZE_BYTES = 5 1024 1024 (5 MB, defined as a module-level constant in intake.py). Raise a typer.Exit(1) with a message that states the size in MB and the limit.
#720 (path containment): scan_for_plans() in intake_sources.py builds abs_path = cwd / rel_path but never checks that abs_path.resolve() is inside cwd.resolve(). A rel_path of ../../escape/plan.md would be admitted. Fix: After resolving abs_path, assert abs_path.resolve().is_relative_to(cwd.resolve()). If not, skip silently (same behavior as PermissionError). Apply this check both to the direct file case and to the directory-expansion path.
#721 (symlink exclusion): The directory expansion in scan_for_plans() calls child.is_file() which follows symlinks, admitting symlinked targets outside the repo. Fix: Add child.is_symlink() check before child.is_file(). If the child is a symlink, skip it (log a debug message if a debug mode is present, otherwise silently skip).
Test surface: Extend existing tests for mission_brief, intake, and intake_sources. Cover: crash-simulation for atomic write, oversized file rejection, out-of-bounds path exclusion, symlink exclusion, and happy-path regression for normal in-repo files.
Work Package Summary
| WP | Title | Files Changed | Key Requirements | Sequencing |
|---|---|---|---|---|
| WP01 | Merge Post-Merge Invariant Fix | merge.py | FR-001–004, NFR-001–004 | First (highest user impact) |
| WP02 | Gemini/Qwen Shim Generation Fix | shims/generator.py | FR-005–008, NFR-001–004 | Second (Gemini users fully blocked) |
| WP03 | Review Lane Semantics Fix | agent/workflow.py | FR-009–012, NFR-001–004 | Third (state corruption, medium-urgency) |
| WP04 | Intake Hardening Cluster | mission_brief.py, intake.py, intake_sources.py | FR-013–017, C-003, C-005, NFR-001–004 | Fourth (defensive hardening) |
Lane assignment: Single lane (all WPs serial). Each WP must have a green test suite before the next starts.
Definition of Done (per WP)
- □ The specific bug scenario from the spec user scenarios no longer reproduces
- □ All pre-existing tests in the affected module's test file still pass
- □ At least one new regression test covers the previously-failing case
- □
pytest -q tests/passes fully - □
mypy --strict src/specify_cli/passes with zero new errors - □ No new modules, packages, or dependencies introduced