Tasks: 3.2.0a5 Tranche 1 — Release Reset & CLI Surface Cleanup

Mission ID: 01KQ7YXHA5AMZHJT3HQ8XPTZ6B (mid8 01KQ7YXH) Mission Slug: release-3-2-0a5-tranche-1-01KQ7YXH Branch contract: planning/base/merge target = release/3.2.0a5-tranche-1 (branch_matches_target = true) Spec: spec.md · Plan: plan.md · Research: research.md · Bulk-edit map: occurrence_map.yaml

Overview

Eight work packages, 39 subtasks total. Seven WPs are independent and lane-parallelizable; WP02 lands last as the CHANGELOG / release-metadata consolidator. Two WPs (WP04, WP06) are at the upper end of complexity for this tranche; the rest are focused 3–5 subtask packages. WP08 was added live during /spec-kitty.tasks when finalize-tasks rejected the mission's own DecisionPoint event — same class of "tooling that bites real workflows" bug as FR-002.

Subtask Index

IDDescriptionWPParallel
T001Swap call order in runner.py:163-164 so metadata.save() precedes _stamp_schema_version()WP01
T002Extend tests/cross_cutting/versioning/test_upgrade_version_update.py with schema_version persistence assertionWP01[D]
T003New tests/e2e/test_upgrade_post_state.py smoke covering upgrade → branch-contextWP01[D]
T004Run mypy --strict and ruff check on changed surfaces; address any driftWP01
T005Bump pyproject.toml::[project].version from 3.2.0a43.2.0a5WP02
T006Split CHANGELOG.md heading: convert [Unreleased - 3.2.0][3.2.0a5] — <date> and insert new [Unreleased] placeholder aboveWP02
T007Consolidate per-FR CHANGELOG entries under [3.2.0a5] (collected from each landed WP's PR description)WP02
T008Run tests/release/test_dogfood_command_set.py and tests/release/test_release_prep.py; update fixtures if driftedWP02
T009Verify spec-kitty --version reports 3.2.0a5 after editable reinstallWP02
T010Replace .python-version contents with 3.11WP03
T011Run mypy --strict src/specify_cli/mission_step_contracts/executor.py; triage and fix any errorsWP03
T012Add new test tests/cross_cutting/test_mypy_strict_mission_step_contracts.py invoking mypy in-process and asserting clean exitWP03[D]
T013Re-run tests/cross_cutting/ and tests/missions/ to confirm no regressions from python-version changeWP03
T014Run ruff check .python-version pyproject.toml src/specify_cli/mission_step_contracts/WP03[D]
T015Delete deprecated /spec-kitty.checklist source template AND its override copyWP04
T016Remove /spec-kitty.checklist entries from .kittify/command-skills-manifest.json and _legacy_codex_hashes.pyWP04[D]
T017Delete every deprecated checklist snapshot, regression baseline, and upgrade fixture per occurrence_map.yamlWP04
T018Update tests/specify_cli/skills/{test_registry,test_command_renderer,test_installer}.py and tests/missions/test_command_templates_canonical_path.py to drop checklist expectationsWP04
T019Add aggregate regression tests/specify_cli/test_no_checklist_surface.py (recursive grep for /spec-kitty.checklist and checklist* filenames across src/tests/docs/agent dirs)WP04[D]
T020Add artifact-preservation test tests/missions/test_specify_creates_requirements_checklist.py proving kitty-specs/<slug>/checklists/requirements.md still gets createdWP04[D]
T021Update doc references: README.md, docs/reference/{slash-commands,file-structure,supported-agents}.md per occurrence_map (REMOVE surface mentions, KEEP artifact-name mentions)WP04[D]
T022Add non-git-target detection in src/specify_cli/cli/commands/init.py near the existing git not detected branch (~line 360); print one yellow info line containing both "not a git repository" and "git init"WP05
T023Append a "next: run git init" item to the post-init quick-start summary in init.py when target is not a git repoWP05
T024Remove the /spec-kitty.checklist quick-start line at init.py:723 (FR-003 boundary owned by WP05 to keep init.py ownership single-WP)WP05
T025Add tests/specify_cli/cli/commands/test_init_non_git_message.py covering both unit assertions and CliRunner-driven smokeWP05
T026Create src/specify_cli/diagnostics/__init__.py and src/specify_cli/diagnostics/dedup.py exposing report_once, mark_invocation_succeeded, invocation_succeeded, reset_for_invocationWP06
T027Wrap Not authenticated, skipping sync callsites at sync/background.py:270 and :325 with report_once("sync.unauthenticated") gateWP06
T028Locate the token-refresh-failed logger in src/specify_cli/auth/ and wrap with report_once("auth.token_refresh_failed")WP06
T029In the agent mission create JSON-payload writer ONLY, call mark_invocation_succeeded() immediately after the final print(json.dumps(...)). Auditing other JSON-emitting commands is explicitly out of scope.WP06
T030Update atexit handlers at sync/background.py:456 and sync/runtime.py:381 to consult invocation_succeeded() and downgrade warnings on successWP06
T031Add tests/sync/test_diagnostic_dedup.py covering ContextVar gate + reset behaviorWP06[D]
T032Add tests/e2e/test_mission_create_clean_output.py covering JSON cleanup + dedup + no-red-after-successWP06[D]
T033Add tests/specify_cli/cli/test_no_visible_feature_alias.py (typer walk + --help grep + hidden=True assertion)WP07[D]
T034Add tests/e2e/test_feature_alias_smoke.py (passing --feature to one historically-accepting command behaves identically to --mission)WP07[D]
T035Add tests/specify_cli/cli/test_decision_command_shape_consistency.py (typer walk + multi-source grep + --help listing assertion)WP07[D]
T036Add an event_type-presence guard in read_events() (src/specify_cli/status/store.py:209 per-line loop) that skips events carrying a top-level event_type field (the wire-format discriminator for mission-level events), with a # Why: comment naming Decision Moment Protocol as the cooperating writer. Preserves the existing fail-loud contract for malformed lane-transition events.WP08
T037Add tests/status/test_read_events_tolerates_decision_events.py exercising mixed lane-transition + DecisionPoint event logsWP08[D]
T038Re-run this mission's finalize-tasks against the fixed reader to confirm the live regression is closed (no bypass needed)WP08
T039Run mypy --strict src/specify_cli/status/store.py and ruff check src/specify_cli/status/ tests/status/test_read_events_tolerates_decision_events.pyWP08[D]

Work Packages


WP01 — FR-002 schema_version clobber fix + regression

  • Goal: After spec-kitty upgrade --yes succeeds, spec_kitty.schema_version MUST persist in .kittify/metadata.yaml, and a subsequent spec-kitty agent mission branch-context --json MUST exit 0 (no PROJECT_MIGRATION_NEEDED block).
  • Priority: P0 — foundational; the dev-experience blocker every other WP would hit.
  • Independent test: tests/e2e/test_upgrade_post_state.py (new) drives a tmp project end-to-end.
  • Subtasks:
  • ✅ T001 Swap call order in runner.py:163-164 so metadata.save() precedes _stamp_schema_version() (WP01)
  • ✅ T002 Extend tests/cross_cutting/versioning/test_upgrade_version_update.py with schema_version persistence assertion (WP01)
  • ✅ T003 New tests/e2e/test_upgrade_post_state.py smoke covering upgrade → branch-context (WP01)
  • ✅ T004 Run mypy --strict and ruff check on changed surfaces; address any drift (WP01)
  • Implementation sketch: One-line code change (move _stamp_schema_version call below metadata.save) plus two test files. Read-modify-atomic-write in _stamp_schema_version already handles the post-save file safely.
  • Parallel opportunities: T002 and T003 can be drafted in parallel by the same agent after T001.
  • Dependencies: none.
  • Risks: minimal; covered by both unit and e2e tests.
  • Estimated prompt size: ~350 lines.
  • Prompt: tasks/WP01-fr002-schema-version-clobber-fix.md

WP02 — NFR-002 release metadata coherence (final consolidator)

  • Goal: pyproject.toml, CHANGELOG.md, .python-version, and the release-prep test fixtures all agree on the next prerelease state (3.2.0a5).
  • Priority: P0 — gates tests/release/; lands LAST so per-WP CHANGELOG entries can be consolidated.
  • Independent test: tests/release/test_dogfood_command_set.py and tests/release/test_release_prep.py pass.
  • Subtasks:
  • ✅ T005 Bump pyproject.toml::[project].version from 3.2.0a43.2.0a5 (WP02)
  • ✅ T006 Split CHANGELOG.md heading: convert [Unreleased - 3.2.0][3.2.0a5] — <date> and insert new [Unreleased] placeholder above (WP02)
  • ✅ T007 Consolidate per-FR CHANGELOG entries under [3.2.0a5] (collected from each landed WP's PR description) (WP02)
  • ✅ T008 Run tests/release/test_dogfood_command_set.py and tests/release/test_release_prep.py; update fixtures if drifted (WP02)
  • ✅ T009 Verify spec-kitty --version reports 3.2.0a5 after editable reinstall (WP02)
  • Implementation sketch: Mechanical edits + run release-prep tests + add CHANGELOG entries summarizing each WP01/WP03..WP08 fix.
  • Parallel opportunities: none — single-file consolidator.
  • Dependencies: WP01, WP03, WP04, WP05, WP06, WP07, WP08 (lands last).
  • Risks: release-prep fixtures may have hardcoded version strings that need updating beyond pyproject — covered by T008.
  • Estimated prompt size: ~250 lines.
  • Prompt: tasks/WP02-release-metadata-coherence.md

WP03 — FR-001 .python-version + restore strict mypy

  • Goal: .python-version no longer pins a higher floor than pyproject.toml::requires-python; mypy --strict is clean on mission_step_contracts/executor.py and stays clean.
  • Priority: P1 — local agent productivity.
  • Independent test: tests/cross_cutting/test_mypy_strict_mission_step_contracts.py (new).
  • Subtasks:
  • ✅ T010 Replace .python-version contents with 3.11 (WP03)
  • ✅ T011 Run mypy --strict src/specify_cli/mission_step_contracts/executor.py; triage and fix any errors (WP03)
  • ✅ T012 Add new test tests/cross_cutting/test_mypy_strict_mission_step_contracts.py invoking mypy in-process and asserting clean exit (WP03)
  • ✅ T013 Re-run tests/cross_cutting/ and tests/missions/ to confirm no regressions from python-version change (WP03)
  • ✅ T014 Run ruff check .python-version pyproject.toml src/specify_cli/mission_step_contracts/ (WP03)
  • Implementation sketch: One-byte file change for .python-version; minor type-annotation cleanups; one new test file.
  • Parallel opportunities: T012 and T014 can be drafted in parallel after T011.
  • Dependencies: none.
  • Risks: T011 may surface latent type errors that pre-existed but were never enforced; budget extra time inside the WP.
  • Decision: Decision Moment 01KQ7ZSQKT9DVH7B4GGXWS8DTW chose 3.11 floor.
  • Estimated prompt size: ~250 lines.
  • Prompt: tasks/WP03-python-version-and-strict-mypy.md

WP04 — FR-003 + FR-004 /spec-kitty.checklist bulk removal

  • Goal: Zero references to /spec-kitty.checklist across every supported agent's rendered surface; kitty-specs/<mission>/checklists/requirements.md still gets created by /spec-kitty.specify.
  • Priority: P1 — largest WP; bulk-edit gated by occurrence_map.yaml.
  • Independent test: tests/specify_cli/test_no_checklist_surface.py (new) + tests/missions/test_specify_creates_requirements_checklist.py (new).
  • Subtasks:
  • ✅ T015 Delete deprecated /spec-kitty.checklist source template AND its override copy (WP04)
  • ✅ T016 Remove /spec-kitty.checklist entries from .kittify/command-skills-manifest.json and _legacy_codex_hashes.py (WP04)
  • ✅ T017 Delete every deprecated checklist snapshot, regression baseline, and upgrade fixture per occurrence_map.yaml (WP04)
  • ✅ T018 Update tests/specify_cli/skills/{test_registry,test_command_renderer,test_installer}.py and tests/missions/test_command_templates_canonical_path.py to drop checklist expectations (WP04)
  • ✅ T019 Add aggregate regression tests/specify_cli/test_no_checklist_surface.py (WP04)
  • ✅ T020 Add artifact-preservation test tests/missions/test_specify_creates_requirements_checklist.py (WP04)
  • ✅ T021 Update doc references per occurrence_map.yaml (WP04)
  • Implementation sketch: Mechanical removal driven by occurrence_map.yaml. Implementing agent MUST load the spec-kitty-bulk-edit-classification skill before starting and verify the diff against the occurrence map before commit.
  • Parallel opportunities: T016, T019, T020, T021 can be done in parallel after T015 lands.
  • Dependencies: none.
  • Risks: missed reference creates a DIRECTIVE_035 violation; mitigated by T019's aggregate scanner. Snapshot tests in T017/T018 must be regenerated for ALL 12 slash-command agents.
  • Boundary: init.py:723 (one occurrence) is owned by WP05 (T024) to keep init.py ownership single-WP.
  • Estimated prompt size: ~500 lines.
  • Prompt: tasks/WP04-checklist-surface-bulk-removal.md

WP05 — FR-005 init non-git message (+ FR-003 init.py boundary line)

  • Goal: Running spec-kitty init in a non-git directory emits a single actionable message; the deprecated /spec-kitty.checklist quick-start line at init.py:723 is removed in the same WP that owns init.py.
  • Priority: P2 — UX polish.
  • Independent test: tests/specify_cli/cli/commands/test_init_non_git_message.py (new).
  • Subtasks:
  • ✅ T022 Add non-git-target detection in src/specify_cli/cli/commands/init.py near the existing git not detected branch; print one yellow info line containing both "not a git repository" and "git init" (WP05)
  • ✅ T023 Append a "next: run git init" item to the post-init quick-start summary in init.py when target is not a git repo (WP05)
  • ✅ T024 Remove the /spec-kitty.checklist quick-start line at init.py:723 (FR-003 boundary owned by WP05 to keep init.py ownership single-WP) (WP05)
  • ✅ T025 Add tests/specify_cli/cli/commands/test_init_non_git_message.py covering both unit assertions and CliRunner-driven smoke (WP05)
  • Implementation sketch: Single subprocess check using git rev-parse --is-inside-work-tree; small UX additions; one new test file.
  • Parallel opportunities: T025 can be drafted in parallel with T022/T023/T024 by the same agent.
  • Dependencies: none.
  • Risks: subprocess to git may fail with the binary missing; existing is_git_available() branch already handles that case — reuse it.
  • Estimated prompt size: ~250 lines.
  • Prompt: tasks/WP05-init-non-git-message.md

WP06 — FR-008 + FR-009 diagnostic dedup + atexit success-flag

  • Goal: One-per-cause diagnostic gating per CLI invocation; no red shutdown noise after a successful JSON-output command.
  • Priority: P2 — visible noise but not blocking.
  • Independent test: tests/sync/test_diagnostic_dedup.py (new) + tests/e2e/test_mission_create_clean_output.py (new).
  • Subtasks:
  • ✅ T026 Create src/specify_cli/diagnostics/__init__.py and src/specify_cli/diagnostics/dedup.py exposing report_once, mark_invocation_succeeded, invocation_succeeded, reset_for_invocation (WP06)
  • ✅ T027 Wrap Not authenticated, skipping sync callsites at sync/background.py:270 and :325 with report_once("sync.unauthenticated") gate (WP06)
  • ✅ T028 Locate the token-refresh-failed logger in src/specify_cli/auth/ and wrap with report_once("auth.token_refresh_failed") (WP06)
  • ✅ T029 In the agent mission create JSON-payload writer ONLY, call mark_invocation_succeeded() immediately after the final print(json.dumps(...)). Auditing other JSON-emitting commands is explicitly out of scope. (WP06)
  • ✅ T030 Update atexit handlers at sync/background.py:456 and sync/runtime.py:381 to consult invocation_succeeded() and downgrade warnings on success (WP06)
  • ✅ T031 Add tests/sync/test_diagnostic_dedup.py covering ContextVar gate + reset behavior (WP06)
  • ✅ T032 Add tests/e2e/test_mission_create_clean_output.py covering JSON cleanup + dedup + no-red-after-success (WP06)
  • Implementation sketch: New diagnostics package using contextvars.ContextVar for dedup + module-level boolean for success flag. Wrap two existing log sites; call mark_invocation_succeeded() from JSON-emitting command paths; consult invocation_succeeded() from atexit handlers.
  • Parallel opportunities: T031 and T032 (test-only) can run in parallel after T026 + T030 land.
  • Dependencies: none.
  • Risks: mark_invocation_succeeded() must NOT be called on failure paths; tests in T032 cover the failure path explicitly.
  • Estimated prompt size: ~500 lines.
  • Prompt: tasks/WP06-diagnostic-dedup-and-atexit.md

WP07 — FR-006 + FR-007 close-with-evidence regressions

  • Goal: Lock down "already-fixed-on-main" status of --feature hidden alias (#790) and spec-kitty agent decision command shape (#774) by adding regression tests that prevent future drift.
  • Priority: P3 — close-with-evidence per start-here.md "Done Criteria".
  • Independent test: the three new test files themselves.
  • Subtasks:
  • ✅ T033 Add tests/specify_cli/cli/test_no_visible_feature_alias.py (typer walk + --help grep + hidden=True assertion) (WP07)
  • ✅ T034 Add tests/e2e/test_feature_alias_smoke.py (passing --feature to one historically-accepting command behaves identically to --mission) (WP07)
  • ✅ T035 Add tests/specify_cli/cli/test_decision_command_shape_consistency.py (typer walk + multi-source grep + --help listing assertion) (WP07)
  • Implementation sketch: Three new test files; no production code changes. Each test introspects the typer app and grep-checks docs/snapshots/templates.
  • Parallel opportunities: All three tests are independent — agent can implement in any order.
  • Dependencies: none.
  • Risks: minimal — purely additive tests over already-working code.
  • Estimated prompt size: ~150 lines.
  • Prompt: tasks/WP07-close-with-evidence-regressions.md

WP08 — FR-010 status event reader robustness fix

  • Goal: read_events() in src/specify_cli/status/store.py MUST tolerate non-lane-transition events (e.g. DecisionPointOpened, DecisionPointResolved) in status.events.jsonl instead of raising KeyError('wp_id').
  • Priority: P0 — currently blocks finalize-tasks (and every other reader) for any mission that has used the Decision Moment Protocol. Discovered live during this very /spec-kitty.tasks run; the mission's own DecisionPoint event triggered the bug.
  • Independent test: tests/status/test_read_events_tolerates_decision_events.py (new).
  • Subtasks:
  • ✅ T036 Add an event_type-presence guard in read_events() (per-line loop) that skips events carrying a top-level event_type field, with a # Why: comment naming Decision Moment Protocol as the cooperating writer. Preserves the existing fail-loud contract for malformed lane-transition events. (WP08)
  • ✅ T037 Add tests/status/test_read_events_tolerates_decision_events.py exercising mixed lane-transition + DecisionPoint event logs (WP08)
  • ✅ T038 Re-run this mission's finalize-tasks against the fixed reader to confirm the live regression is closed (WP08)
  • ✅ T039 Run mypy --strict src/specify_cli/status/store.py and ruff check src/specify_cli/status/ tests/status/test_read_events_tolerates_decision_events.py (WP08)
  • Implementation sketch: ~5 LOC inside read_events() per-line loop + a sibling unit test. Uses the duck-type approach instead of an event-type allowlist for future-proofing.
  • Parallel opportunities: T037 and T039 can be drafted in parallel after T036.
  • Dependencies: none.
  • Risks: low — additive guard; existing lane-transition path unaffected.
  • Live evidence: this mission's own kitty-specs/release-3-2-0a5-tranche-1-01KQ7YXH/status.events.jsonl starts with a DecisionPointOpened event and finalize-tasks failed on it. After T036 lands, T038 confirms the live regression is closed.
  • Estimated prompt size: ~250 lines.
  • Prompt: tasks/WP08-status-event-reader-robustness.md

MVP Scope

MVP = WP01 + WP08 + WP02. Without WP01, every other WP's implementer hits the PROJECT_MIGRATION_NEEDED gate documented in spec.md. Without WP08, every other WP's implementer hits the finalize-tasks reader bug as soon as their mission opens any Decision Moment. Without WP02, the release-prep tests fail. Everything else is desirable but not on the critical path for "the next prerelease tag exists and works".

Parallelization

WP01, WP03, WP04, WP05, WP06, WP07, WP08 are all dependency-free and can be lane-parallelized. WP02 lands last to consolidate CHANGELOG entries.