Implementation Plan: CLI 2.x Readiness Sprint

Branch: 2.x (delivery branch) | Date: 2026-02-12 | Spec: spec.md Input: Feature specification from kitty-specs/039-cli-2x-readiness/spec.md

Summary

Close all CLI-side readiness gaps for spec-kitty 2.x by fixing the planning workflow blocker, hardening the authenticated sync path with actionable diagnostics, converging global runtime resolution, testing the 7-to-4 lane collapse mapping, and producing a SaaS handoff contract document. All work targets the 2.x branch (588 commits diverged from main). The sync infrastructure (13 modules in src/specify_cli/sync/) already exists on 2.x — this sprint is "debug, fix, and harden," not greenfield.

Technical Context

Language/Version: Python 3.11+ (existing spec-kitty codebase) Primary Dependencies: typer, rich, ruamel.yaml, httpx (sync client), pydantic (event models), spec-kitty-events (vendored) Storage: SQLite (event queue via sync/queue.py), TOML (credentials via ~/.spec-kitty/credentials), filesystem (YAML frontmatter, JSON metadata) Testing: pytest (93+ existing green sync/auth tests on 2.x), typer.testing.CliRunner Target Platform: Linux, macOS, Windows 10+ (cross-platform CLI) Project Type: Single CLI project Performance Goals: CLI operations < 2 seconds; batch sync < 5 seconds for 1000 events Constraints: Offline-capable (queue events when disconnected), JWT auth (username/password -> access+refresh tokens) Scale/Scope: ~2000+ tests total, 13 sync modules, 85+ sync-specific tests passing

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

GateStatusNotes
Python 3.11+PASSAll code targets Python 3.11+
pytest with 90%+ coverage for new codePASSWill maintain; 93+ sync tests already pass
mypy --strictPASSWill maintain strict typing in new/modified code
Cross-platform (Linux, macOS, Windows)PASSNo platform-specific changes planned
Git requiredPASSAll worktree features depend on Git
2.x branch for SaaS featuresPASSAll work targets 2.x per constitution's two-branch strategy
spec-kitty-events via Git pinningPASSUsing vendored copy at src/specify_cli/spec_kitty_events/
No 1.x compatibility constraintsPASS2.x is greenfield per constitution
CLI < 2 secondsWATCHsync now may exceed for large queues — document as expected

No constitution violations. No complexity tracking needed.

Project Structure

Documentation (this feature)

kitty-specs/039-cli-2x-readiness/
├── spec.md               # Feature specification
├── plan.md               # This file
├── meta.json             # Feature metadata
├── research.md           # Phase 0: research findings
├── data-model.md         # Phase 1: entity model
├── quickstart.md         # Phase 1: implementation guide
├── contracts/            # Phase 1: API contracts
│   ├── batch-ingest.md   # SaaS batch endpoint contract
│   └── lane-mapping.md   # 7→4 lane collapse specification
├── checklists/
│   └── requirements.md   # Spec quality checklist
└── tasks.md              # Phase 2 output (created by /spec-kitty.tasks)

Source Code (2.x branch, repository root)

src/specify_cli/
├── sync/                      # Existing sync infrastructure (13 modules)
│   ├── __init__.py
│   ├── auth.py                # JWT auth: login, token refresh, credential storage
│   ├── background.py          # Background sync service with periodic flush
│   ├── batch.py               # Batch HTTP client → POST /api/v1/events/batch/
│   ├── client.py              # HTTP/WS client base
│   ├── clock.py               # Lamport clock persistence bridge
│   ├── config.py              # Sync config (server_url, intervals)
│   ├── emitter.py             # Event emission: status transitions → queue/sync
│   ├── events.py              # Event type definitions
│   ├── git_metadata.py        # Git metadata resolver for event enrichment
│   ├── project_identity.py    # Project UUID/slug resolution
│   ├── queue.py               # SQLite offline event queue
│   └── runtime.py             # SyncRuntime lazy singleton
├── cli/commands/
│   ├── sync.py                # VCS workspace sync CLI (separate from event sync)
│   └── agent/
│       ├── feature.py         # setup-plan command (NameError fix target)
│       └── tasks.py           # move-task, list-tasks, validate-workflow
├── core/
│   └── project_resolver.py    # resolve_template_path (global runtime fix target)
├── events/
│   ├── adapter.py             # spec_kitty_events ↔ CLI bridge
│   └── store.py               # Event storage stub
├── spec_kitty_events/         # Vendored event library (7-lane status model)
│   ├── models.py              # Event envelope (Pydantic)
│   ├── status.py              # Lane enum, StatusTransitionPayload
│   ├── lifecycle.py           # Mission lifecycle events
│   ├── storage.py             # Storage abstractions (ABC)
│   ├── clock.py               # LamportClock
│   └── conflict.py            # Concurrent event detection
└── task_helpers_shared.py     # LANES 4-tuple, ensure_lane()

tests/
├── integration/
│   ├── test_planning_workflow.py   # 5 tests (1 xfail)
│   ├── test_task_workflow.py       # 18 tests
│   └── test_sync_e2e.py           # 3 tests (existing on 2.x)
├── specify_cli/
│   ├── sync/                       # 85+ sync unit tests (existing on 2.x)
│   └── cli/commands/agent/
│       ├── test_planning_workflow.py
│       └── test_task_workflow.py
└── e2e/
    └── test_cli_smoke.py           # NEW: full workflow smoke test

Structure Decision: Existing 2.x structure. No new directories except tests/e2e/ for the smoke test and kitty-specs/039-cli-2x-readiness/contracts/ for handoff docs.

Work Packages

Dependency Graph

WP01 (setup-plan fix) ──────────────────────────────────┐
WP02 (batch error surfacing) ──┐                         │
WP03 (sync status --check) ───┤                          │
WP04 (sync diagnose) ─────────┤── WP07 (handoff doc) ──── WP09 (E2E smoke)
WP05 (sync status extend) ────┘                          │
WP06 (lane mapping tests) ────────────────────────────────┤
WP08 (global runtime) ────────────────────────────────────┘

WP01: Fix setup-plan NameError on 2.x (P0)

Owner: Any agent Dependencies: None Effort: Small (surgical fix)

What: Add missing from specify_cli.mission import get_feature_mission_key import to src/specify_cli/cli/commands/agent/feature.py on the 2.x branch (same fix already applied on main at commit 5332408f).

Acceptance:

  • spec-kitty agent feature setup-plan completes without NameError
  • test_planning_workflow.py::test_setup_plan_in_main passes
  • test_planning_workflow.py::test_full_planning_workflow_no_worktrees xfail reason investigated and either fixed or re-documented

Files changed:

  • src/specify_cli/cli/commands/agent/feature.py (add import)

WP02: Fix batch error surfacing and diagnostics (P0)

Owner: Any agent Dependencies: None (parallel with WP01) Effort: Medium

What: Fix batch.py to surface the server's details field from 400 responses and per-event results[] statuses. Currently batch.py:135 only reads the top-level error field, discarding per-event failure reasons.

Changes: 1. Parse per-event results from successful batch responses (HTTP 200 with results[]) 2. Parse details field from 400 error responses (not just error) 3. Group failures by reason category (schema_mismatch, auth_expired, server_error, unknown) 4. Print actionable summary: Synced: N, Duplicates: N, Failed: N (schema_mismatch: X, auth_expired: Y) 5. Support --report <file.json> flag for large failure sets (JSON dump of per-event details) 6. On partial success, remove synced + duplicate events from queue, retain failures with incremented retry_count

Acceptance:

  • Existing 85+ sync tests still pass (no regressions)
  • New tests: batch response parsing for 200 with mixed results, 400 with details, partial success
  • User sees grouped error reasons, not bare count

Files changed:

  • src/specify_cli/sync/batch.py (error parsing, summary formatting)
  • src/specify_cli/sync/queue.py (selective removal of synced events)
  • tests/specify_cli/sync/test_batch.py (new test cases)

WP03: Fix sync status --check to use real token (P0)

Owner: Any agent Dependencies: None (parallel with WP01, WP02) Effort: Small

What: Replace the hardcoded test token in sync.py:531 (sync status --check) with the user's actual auth token from ~/.spec-kitty/credentials. The current test token probe produces misleading 403 errors that don't reflect real auth state.

Changes: 1. Load real access token from credentials store 2. If token is expired, attempt refresh first 3. If no credentials exist, report "Not authenticated — run spec-kitty auth login" instead of a misleading 403 4. Probe actual batch endpoint with real token (not a synthetic test path)

Acceptance:

  • sync status --check reports accurate auth state
  • No false 403 errors from test tokens
  • Clear "not authenticated" message when no credentials stored

Files changed:

  • src/specify_cli/cli/commands/sync.py or src/specify_cli/sync/ (status check path)
  • tests/specify_cli/sync/test_sync_status.py (new/updated tests)

WP04: Add sync diagnose command (P1)

Owner: Any agent Dependencies: WP02 (uses same error categorization) Effort: Medium

What: Add spec-kitty sync diagnose that validates queued events locally against the event schema before attempting to send them, enabling self-service debugging.

Changes: 1. New CLI command sync diagnose under the sync command group 2. Read all pending events from SQLite queue 3. Validate each event against the Pydantic Event model from spec_kitty_events.models 4. Validate payloads against StatusTransitionPayload for WPStatusChanged events 5. Report per-event validation results: valid count, invalid count, specific field errors 6. Optionally validate against events.schema.json if available

Acceptance:

  • spec-kitty sync diagnose reports schema mismatches for intentionally malformed events
  • Valid events pass without false positives
  • Output is actionable (shows which field, expected type, actual value)

Files changed:

  • src/specify_cli/sync/diagnose.py (new module)
  • src/specify_cli/cli/commands/ (register command)
  • tests/specify_cli/sync/test_diagnose.py (new tests)

WP05: Extend sync status with queue health (P1)

Owner: Any agent Dependencies: None (parallel) Effort: Medium

What: Extend spec-kitty sync status to show queue depth, oldest event age, retry-count distribution, and top failing event types — beyond the current connection/auth-only output.

Changes: 1. Query SQLite queue for aggregate stats (COUNT, MIN(created_at), retry_count distribution) 2. Group pending events by event_type to show top failing types 3. Show retry-count histogram (e.g., "0 retries: 50, 1-3 retries: 30, 4+ retries: 25") 4. Show oldest event age in human-readable format (e.g., "oldest event: 3 hours ago") 5. Format output with Rich tables/panels

Acceptance:

  • sync status shows queue depth, oldest age, retry distribution, top event types
  • Output is clear and actionable
  • Existing sync status tests still pass

Files changed:

  • src/specify_cli/sync/queue.py (add aggregate query methods)
  • src/specify_cli/cli/commands/ (extend status output)
  • tests/specify_cli/sync/test_queue.py (new aggregate query tests)

WP06: Test and document 7→4 lane collapse mapping (P2)

Owner: Any agent Dependencies: None (parallel) Effort: Small

What: The 7→4 lane mapping exists at emitter.py:46 but lacks explicit tests and documentation. Add comprehensive tests and produce a specification document.

Current mapping (from emitter.py):

  • PLANNED → planned
  • CLAIMED → doing
  • IN_PROGRESS → doing
  • FOR_REVIEW → for_review
  • DONE → done
  • BLOCKED → doing
  • CANCELED → done

Changes: 1. Add parametrized tests covering all 7 input lanes with expected 4-lane outputs 2. Add test for unknown lane value handling (should raise ValueError) 3. Extract mapping to a named constant/function if not already (for documentation and reuse) 4. Write contracts/lane-mapping.md with mapping table, rationale, and edge case behavior

Acceptance:

  • All 7 lanes tested with correct 4-lane output
  • Unknown lane input raises clear error
  • contracts/lane-mapping.md exists with complete mapping specification

Files changed:

  • tests/specify_cli/sync/test_lane_mapping.py (new test file)
  • src/specify_cli/sync/emitter.py (extract mapping if needed)
  • kitty-specs/039-cli-2x-readiness/contracts/lane-mapping.md (new doc)

WP07: SaaS handoff contract document (P0)

Owner: Any agent Dependencies: WP02 (error format), WP06 (lane mapping) Effort: Medium

What: Produce a contract document that the spec-kitty-saas team can use to validate their batch endpoint against CLI payloads, without consulting CLI source code.

Contents: 1. Event envelope fields (all fields from Event model with types, constraints, examples) 2. Batch request format: POST /api/v1/events/batch/, headers (Authorization: Bearer <token>, Content-Type: application/json, Content-Encoding: gzip), body {"events": [...]} 3. Batch response format: HTTP 200 with {"results": [{"event_id": "...", "status": "success|duplicate|rejected", "error": "..."}]}, HTTP 400 with {"error": "...", "details": "..."} 4. Authentication: JWT flow (login endpoint, token refresh endpoint, header format) 5. Lane mapping table (from WP06) 6. Event types and their payload schemas (WPStatusChanged, MissionStarted, etc.) 7. Fixture data: 3-5 complete batch request examples with expected responses 8. Required SaaS changes (from spec.md "Required SaaS Changes/Dependencies" section)

Acceptance:

  • SaaS team can construct valid batch request from doc alone
  • Fixture data validates against CLI-side Pydantic models
  • All event types documented with payload schemas

Files changed:

  • kitty-specs/039-cli-2x-readiness/contracts/batch-ingest.md (new)
  • kitty-specs/039-cli-2x-readiness/contracts/lane-mapping.md (cross-ref from WP06)

WP08: Converge global runtime resolution on 2.x (P1)

Owner: Any agent Dependencies: None (parallel) Effort: Medium

What: 2.x has partial ~/.kittify global runtime bootstrap but still shows legacy project fallback warnings. Make resolution deterministic: global ~/.kittify is canonical after spec-kitty migrate, with clear error (not silent fallback) if migration hasn't run.

Changes: 1. Audit current resolution chain in core/project_resolver.py on 2.x (already has home resolver logic) 2. Ensure resolve_template_path() includes ~/.kittify/ in the chain: project .kittify/missions/{key}/templates/ → project .kittify/templates/~/.kittify/missions/{key}/templates/~/.kittify/templates/ → package defaults 3. Eliminate legacy fallback warnings after migration (if ~/.kittify/ exists, no warnings) 4. If ~/.kittify/ doesn't exist and project-level fallback is used, emit a one-time "run spec-kitty migrate for global runtime" message (not a warning flood) 5. Ensure spec-kitty migrate is idempotent (running twice produces same state) 6. Address credential path split: document whether ~/.spec-kitty/credentials stays or moves to ~/.kittify/credentials

Decision needed (captured, not blocking): Keep ~/.spec-kitty/credentials separate from ~/.kittify/ for now. Credentials are auth-specific; runtime is template-specific. Merging them is a follow-on.

Acceptance:

  • After spec-kitty migrate, zero fallback warnings during normal operation
  • Resolution chain includes ~/.kittify/ between project and package defaults
  • spec-kitty migrate is idempotent
  • Credential path decision documented

Files changed:

  • src/specify_cli/core/project_resolver.py (resolution chain)
  • src/specify_cli/cli/commands/migrate.py (idempotency, global install)
  • tests/specify_cli/core/test_project_resolver.py (new resolution tests)

WP09: End-to-end CLI smoke test (P0)

Owner: Any agent Dependencies: WP01 (setup-plan must work), WP02 (sync must surface errors correctly) Effort: Medium

What: Add a new E2E smoke test that exercises the full create-feature → setup-plan → implement → review sequence against a temporary repository on the 2.x branch.

Changes: 1. Create tests/e2e/test_cli_smoke.py 2. Fixture: create temp git repo, initialize spec-kitty, set up .kittify/ 3. Test sequence: a. spec-kitty agent feature create-feature "smoke-test" --json b. Write a minimal spec.md to the feature directory c. spec-kitty agent feature setup-plan --feature <slug> --json d. spec-kitty agent tasks finalize-tasks --feature <slug> --json (with pre-written tasks.md) e. spec-kitty implement WP01 (create workspace) f. Make a code change in workspace, commit g. spec-kitty agent tasks move-task WP01 --to for_review --feature <slug> --json h. Verify final state: WP01 in for_review lane, all artifacts exist 4. Mark as pytest.mark.e2e for optional CI separation

Acceptance:

  • Full sequence completes without errors
  • All intermediate artifacts verified (spec.md, plan.md, tasks/, worktree)
  • Test is self-contained (creates and cleans up its own temp repo)
  • Passes locally and in CI on 2.x

Files changed:

  • tests/e2e/test_cli_smoke.py (new file)
  • tests/e2e/__init__.py (new file)
  • pyproject.toml (add e2e pytest marker if needed)

Test Matrix

LayerWhatWhereCount
UnitBatch error parsing, queue aggregates, lane mapping, diagnose validationtests/specify_cli/sync/~20 new
UnitTemplate resolution with ~/.kittifytests/specify_cli/core/~5 new
IntegrationPlanning workflow (existing + fix)tests/integration/test_planning_workflow.py5 existing
IntegrationTask workflow (existing)tests/integration/test_task_workflow.py18 existing
IntegrationSync E2E (existing)tests/integration/test_sync_e2e.py3 existing
E2EFull CLI smoke testtests/e2e/test_cli_smoke.py1 new
ContractFixture data validates against Pydantic modelstests/contract/test_handoff_fixtures.py~3 new
BaselineExisting sync/auth tests2.x test suite93+ existing
TargetZero regressions + ~30 new tests~123+ total sync-related

Sequencing and Parallelization

Wave 1 (independent, can run in parallel):

  • WP01: Fix setup-plan NameError
  • WP02: Fix batch error surfacing
  • WP03: Fix sync status --check
  • WP05: Extend sync status queue health
  • WP06: Lane mapping tests + docs
  • WP08: Global runtime convergence

Wave 2 (depends on Wave 1 outputs):

  • WP04: Sync diagnose command (depends on WP02 error categorization)
  • WP07: SaaS handoff contract doc (depends on WP02 error format + WP06 lane mapping)

Wave 3 (integration):

  • WP09: E2E smoke test (depends on WP01 + WP02)

Rollout / Backward Compatibility

For 2.x users (pre-release alpha):

  • All changes ship as part of 2.0.0a4+ on the 2.x branch
  • No backward compat concerns within 2.x (pre-release, per constitution)
  • spec-kitty migrate is the migration path from 1.x project state

For main branch users:

  • No changes to main in this sprint
  • Main remains offline-only, sync-free
  • Post-stabilization decision: promote 2.x to mainline or keep split

Credential path:

  • ~/.spec-kitty/credentials stays for now (auth-specific)
  • ~/.kittify/ is for runtime (templates, missions)
  • Future consolidation is a follow-on decision

Risks and Mitigations

RiskImpactLikelihoodMitigation
2.x divergence makes fixes harder than expectedSchedule slipMediumEach WP is scoped to specific files; use git show 2.x:<file> to pre-read before branching
Batch ingest failures are server-side, not CLI-sideWP02 incompleteMediumWP07 handoff doc captures required SaaS changes; CLI-side diagnostics still valuable even if server needs fixes
~/.kittify migration breaks existing 2.x alpha usersUser disruptionLowMake spec-kitty migrate idempotent; test with existing 2.x project state
E2E smoke test is flaky in CIFalse failuresMediumUse pytest.mark.e2e marker for optional separation; ensure test cleanup is robust
Lane mapping has edge cases not covered by current emitterSync data corruptionLowWP06 adds parametrized tests for all 7 lanes + unknown input; explicitly test BLOCKED and CANCELED