Spec Kitty

└─ kitty-specs
   └─ Orchestrator End-to-End Testing Suite

Mission Run:

📚 Docs ↗

Work Packages: Orchestrator End-to-End Testing Suite

Inputs: Design documents from /kitty-specs/021-orchestrator-end-to-end-testing-suite/ Prerequisites: plan.md (required), spec.md (user stories), data-model.md

Tests: This feature IS a testing infrastructure - all test files are production code.

Organization: Fine-grained subtasks (Txxx) roll up into work packages (WPxx). Each work package must be independently deliverable and testable.

Prompt Files: Each work package references a matching prompt file in /tasks/ generated by /spec-kitty.tasks.

Subtask Format: `[Txxx] [P?] Description`

[P] indicates the subtask can proceed in parallel (different files/components).
Include precise file paths or modules.

Work Package WP01: Test Utilities Foundation (Priority: P0)

Goal: Create the core test utilities module with agent availability detection. Independent Test: AgentAvailability dataclass can detect and probe agents. Prompt: tasks/WP01-test-utilities-foundation.md Estimated Size: ~350 lines

Included Subtasks

✅ T001 Create src/specify_cli/orchestrator/testing/__init__.py module structure
✅ T002 Implement AgentAvailability dataclass in availability.py
✅ T003 Implement is_installed check using existing agent invoker registry
✅ T004 Implement probe() method for auth verification (lightweight API call)
✅ T005 Implement detect_all_agents() function with tier categorization

Implementation Notes

Reuse existing agent invoker detection from src/specify_cli/orchestrator/agents/__init__.py
Each agent's probe() should timeout at 10 seconds
Cache detection results at module level for session duration

Parallel Opportunities

T002 (dataclass) can be written first; T003-T005 can then proceed in parallel

Dependencies

None (foundation package)

Risks & Mitigations

Agent probes might be slow → use 10s timeout per agent
Some agents may not have probe capability → graceful fallback to install-only check

Work Package WP02: Test Path Selection (Priority: P0)

Goal: Implement the test path model that selects 1-agent, 2-agent, or 3+-agent paths based on availability. Independent Test: TestPath correctly assigns agents based on availability count. Prompt: tasks/WP02-test-path-selection.md Estimated Size: ~280 lines

Included Subtasks

✅ T006 Implement TestPath dataclass in paths.py
✅ T007 Implement path selection logic based on agent count
✅ T008 Implement agent assignment (implementation, review, fallback)
✅ T009 Add select_test_path() function with caching

Implementation Notes

Path selection: 1 agent → 1-agent, 2 agents → 2-agent, 3+ agents → 3+-agent
Assignment: First agent for impl, second for review, third for fallback
Cache the selected path per test session

Parallel Opportunities

T006 must complete first; T007-T009 can then proceed together

Dependencies

Depends on WP01 (needs detect_all_agents())

Risks & Mitigations

Agent order might vary → use consistent sorting by agent_id

Work Package WP03: Fixture Data Structures (Priority: P0)

Goal: Define all data structures for fixture management. Independent Test: All dataclasses serialize/deserialize correctly to/from JSON. Prompt: tasks/WP03-fixture-data-structures.md Estimated Size: ~320 lines

Included Subtasks

✅ T010 Implement FixtureCheckpoint dataclass in fixtures.py
✅ T011 Implement WorktreeMetadata dataclass
✅ T012 Implement TestContext dataclass
✅ T013 Add worktrees.json schema validation with Pydantic or manual checks
✅ T014 Add state.json schema validation (OrchestrationRun compatibility)

Implementation Notes

Match data-model.md specifications exactly
Use dataclasses with optional Pydantic validation
Ensure all paths use pathlib.Path

Parallel Opportunities

T010-T012 (dataclasses) can proceed in parallel
T013-T014 (validation) depend on their respective dataclasses

Dependencies

None (can proceed in parallel with WP01-WP02)

Risks & Mitigations

Schema drift with OrchestrationRun → import from orchestrator.state directly

Work Package WP04: Fixture Loader (Priority: P1)

Goal: Implement fixture loading that restores checkpoints to usable test state. Independent Test: Loading a checkpoint creates valid TestContext with git repo. Prompt: tasks/WP04-fixture-loader.md Estimated Size: ~450 lines

Included Subtasks

✅ T015 Implement copy_fixture_to_temp() - copies checkpoint to temp directory
✅ T016 Implement init_git_repo() - initializes git in temp directory
✅ T017 Implement create_worktrees_from_metadata() - recreates worktrees
✅ T018 Implement load_orchestration_state() - deserializes state.json
✅ T019 Implement load_checkpoint() - assembles full TestContext
✅ T020 Implement cleanup_test_context() - removes temp directories

Implementation Notes

Use tempfile.mkdtemp() for isolation
Git init must create main branch and initial commit
Worktree creation uses git worktree add
State loading uses existing OrchestrationRun.from_json()

Parallel Opportunities

T015-T018 can proceed in parallel (independent utilities)
T019 depends on all of the above

Dependencies

Depends on WP03 (needs dataclasses)

Risks & Mitigations

Temp directory cleanup failure → use atexit handlers and try/finally
Git worktree issues → validate git version compatibility

Work Package WP05: Initial Checkpoint Fixtures (Priority: P1)

Goal: Create the first set of checkpoint fixtures for testing. Independent Test: Fixtures exist and can be loaded successfully. Prompt: tasks/WP05-initial-checkpoint-fixtures.md Estimated Size: ~400 lines

Included Subtasks

✅ T021 Create minimal test feature structure (spec.md, plan.md, meta.json, tasks/)
✅ T022 Create checkpoint_wp_created/ fixture (WPs in planned lane)
✅ T023 Create checkpoint_wp_implemented/ fixture (WP01 implemented)
✅ T024 Create checkpoint_review_pending/ fixture (WP01 in review)
✅ T025 Add fixture manifest in __init__.py for discovery

Implementation Notes

Directory structure: tests/fixtures/orchestrator/checkpoint_<name>/
Each checkpoint needs: state.json, feature/, worktrees.json
Minimal feature: 2 WPs (WP01, WP02) with simple tasks

Parallel Opportunities

T021 must complete first; T022-T024 can then proceed in parallel

Dependencies

Depends on WP03 (needs data structure definitions)

Risks & Mitigations

Fixture size → keep features minimal (2 WPs, no actual code)

Work Package WP06: pytest Configuration (Priority: P1)

Goal: Set up pytest fixtures and markers for orchestrator tests. Independent Test: pytest collects tests with correct markers and fixtures inject properly. Prompt: tasks/WP06-pytest-configuration.md Estimated Size: ~380 lines

Included Subtasks

✅ T026 Create tests/specify_cli/orchestrator/conftest.py
✅ T027 Register custom markers in pytest.ini or conftest
✅ T028 Implement available_agents fixture (session-scoped)
✅ T029 Implement test_path fixture (derives from available_agents)
✅ T030 Implement test_context fixture (function-scoped, loads checkpoint)

Implementation Notes

Markers: orchestrator_availability, orchestrator_fixtures, orchestrator_happy_path, orchestrator_review_cycles, orchestrator_parallel, orchestrator_smoke, core_agent, extended_agent
Session-scoped fixtures cache agent detection
Function-scoped fixtures provide isolated test contexts

Parallel Opportunities

T026-T027 can proceed together; T028-T030 depend on them

Dependencies

Depends on WP01 (availability detection), WP02 (path selection), WP04 (fixture loading)

Risks & Mitigations

Marker conflicts → use unique prefix orchestrator_

Work Package WP07: Happy Path Tests (Priority: P1) 🎯 MVP

Goal: Implement end-to-end tests for happy path orchestration. Independent Test: Single WP orchestration completes with correct state. Prompt: tasks/WP07-happy-path-tests.md Estimated Size: ~450 lines

Included Subtasks

✅ T031 Implement test: single WP orchestration end-to-end
✅ T032 Implement test: multiple parallel WPs orchestration
✅ T033 Implement test: state validation after orchestration
✅ T034 Implement test: lane status consistency (frontmatter matches state)
✅ T035 Implement test: commit verification in worktrees

Implementation Notes

Use real agents (no mocks)
Start from checkpoint_wp_created fixture
Validate OrchestrationRun state file after completion
Verify git commits exist in worktree branches

Parallel Opportunities

All tests can be developed in parallel once fixtures are ready

Dependencies

Depends on WP05 (fixtures), WP06 (pytest config)

Risks & Mitigations

Long test times → mark with @pytest.mark.slow
Agent unavailability → proper skip/fail based on tier

Work Package WP08: Review Cycle Tests (Priority: P2)

Goal: Test review rejection and re-implementation cycles. Independent Test: WP correctly cycles through rejection and re-implementation. Prompt: tasks/WP08-review-cycle-tests.md Estimated Size: ~420 lines

Included Subtasks

✅ T036 Implement test: review rejection triggers re-implementation
✅ T037 Implement test: re-implementation produces new commits
✅ T038 Implement test: full cycle (reject → re-impl → re-review → approve)
✅ T039 Implement test: max review cycles exceeded marks WP failed
✅ T040 Implement test: state transition history is recorded

Implementation Notes

Start from checkpoint_review_pending fixture
Need way to trigger review rejection (agent prompt or fixture state)
Verify state file contains complete transition history

Parallel Opportunities

T036-T040 can proceed in parallel

Dependencies

Depends on WP05 (fixtures), WP06 (pytest config)

Risks & Mitigations

Review cycle depends on agent behavior → use fixture at review_rejected state

Work Package WP09: Parallel and Dependency Tests (Priority: P2)

Goal: Test parallel execution and dependency ordering. Independent Test: Orchestrator respects dependency graph. Prompt: tasks/WP09-parallel-dependency-tests.md Estimated Size: ~480 lines

Included Subtasks

✅ T041 Implement test: independent WPs start simultaneously
✅ T042 Implement test: WP waits for dependencies before starting
✅ T043 Implement test: circular dependency detection (pre-execution fail)
✅ T044 Implement test: diamond dependency pattern execution order
✅ T045 Implement test: linear chain (WP01 → WP02 → WP03)
✅ T046 Implement test: fan-out pattern (WP01 → WP02, WP03, WP04)

Implementation Notes

Need fixtures with specific dependency patterns
Verify start times to confirm parallelization
Circular dependency should fail at validation, not execution

Parallel Opportunities

All tests can be developed in parallel

Dependencies

Depends on WP05 (fixtures), WP06 (pytest config)

Risks & Mitigations

Timing verification flaky → use timestamps from state, not wall clock

Work Package WP10: Extended Agent Smoke Tests (Priority: P3)

Goal: Basic validation that extended agents can be invoked. Independent Test: Each available extended agent creates a file. Prompt: tasks/WP10-extended-agent-smoke-tests.md Estimated Size: ~380 lines

Included Subtasks

✅ T047 Implement smoke test infrastructure (base class or helper)
✅ T048 Implement file touch task and verify_smoke_result() function
✅ T049 Create parametrized smoke tests for all 7 extended agents
✅ T050 Implement skip behavior for unavailable extended agents
✅ T051 Add timing validation (each test completes in <60s)

Implementation Notes

Use @pytest.mark.parametrize with agent list
Skip gracefully with pytest.skip() when agent unavailable
Verify file creation, not file contents

Parallel Opportunities

T047-T048 must complete first; T049-T051 then in parallel

Dependencies

Depends on WP01 (availability detection), WP06 (pytest config)

Risks & Mitigations

Extended agents may have different invocation patterns → use existing invokers

Work Package WP11: Additional Checkpoint Fixtures (Priority: P2)

Goal: Complete the remaining checkpoint fixtures. Independent Test: All checkpoints load successfully and represent correct state. Prompt: tasks/WP11-additional-checkpoint-fixtures.md Estimated Size: ~300 lines

Included Subtasks

✅ T052 Create checkpoint_review_rejected/ fixture
✅ T053 Create checkpoint_review_approved/ fixture
✅ T054 Create checkpoint_wp_merged/ fixture
✅ T055 Add stale checkpoint detection (version mismatch warning)

Implementation Notes

Follow same structure as WP05 checkpoints
review_rejected needs state with rejection history
wp_merged represents post-merge state on main

Parallel Opportunities

T052-T054 can proceed in parallel

Dependencies

Depends on WP03 (dataclasses), WP05 (initial fixtures for reference)

Risks & Mitigations

Fixture maintenance → add version field for staleness check

Work Package WP12: Integration and Polish (Priority: P3)

Goal: Final integration, validation utilities, and polish. Independent Test: Full test suite runs and quickstart scenarios pass. Prompt: tasks/WP12-integration-polish.md Estimated Size: ~320 lines

Included Subtasks

✅ T056 Update root tests/conftest.py with orchestrator marker registration
✅ T057 Add state validation utilities (validate_test_result())
✅ T058 Add test output helpers (distinguish skip vs fail messages)
✅ T059 Add timeout configuration via environment variables
✅ T060 Validate quickstart.md scenarios work as documented

Implementation Notes

Markers must be registered in root conftest for pytest collection
State validation checks: file integrity, lane consistency, git state
Timeout defaults: 300s per test, 10s for probe

Parallel Opportunities

T056-T059 can proceed in parallel; T060 is final validation

Dependencies

Depends on all previous WPs (final polish)

Risks & Mitigations

Integration issues → run full suite after each WP merge

Dependency & Execution Summary

Phase 0 (Foundation):
  WP01 (Test Utilities) ──┐
  WP02 (Test Paths) ──────┼──→ WP06 (pytest Config)
  WP03 (Data Structures) ─┴──→ WP04 (Fixture Loader) ──→ WP05 (Initial Fixtures)

Phase 1 (Core Tests):
  WP06 + WP05 ──→ WP07 (Happy Path) 🎯 MVP
            ├──→ WP08 (Review Cycles)
            └──→ WP09 (Parallel/Deps)

Phase 2 (Extended):
  WP01 + WP06 ──→ WP10 (Smoke Tests)
  WP03 + WP05 ──→ WP11 (Additional Fixtures)

Phase 3 (Polish):
  All ──→ WP12 (Integration)

Parallelization Opportunities:

WP01, WP02, WP03 can proceed simultaneously (no dependencies)
WP07, WP08, WP09 can proceed in parallel once fixtures ready
WP10, WP11 can proceed in parallel with Phase 1 tests

MVP Scope: WP01 → WP03 → WP04 → WP05 → WP06 → WP07 (Happy Path Tests)

Subtask Index (Reference)

Subtask ID	Summary	Work Package	Priority	Parallel?
T001	Create testing module structure	WP01	P0	No
T002	Implement AgentAvailability dataclass	WP01	P0	No
T003	Implement is_installed check	WP01	P0	Yes
T004	Implement probe() auth verification	WP01	P0	Yes
T005	Implement detect_all_agents()	WP01	P0	No
T006	Implement TestPath dataclass	WP02	P0	No
T007	Implement path selection logic	WP02	P0	Yes
T008	Implement agent assignment	WP02	P0	Yes
T009	Add select_test_path() with caching	WP02	P0	Yes
T010	Implement FixtureCheckpoint dataclass	WP03	P0	Yes
T011	Implement WorktreeMetadata dataclass	WP03	P0	Yes
T012	Implement TestContext dataclass	WP03	P0	Yes
T013	Add worktrees.json validation	WP03	P0	No
T014	Add state.json validation	WP03	P0	No
T015	Implement copy_fixture_to_temp()	WP04	P1	Yes
T016	Implement init_git_repo()	WP04	P1	Yes
T017	Implement create_worktrees_from_metadata()	WP04	P1	Yes
T018	Implement load_orchestration_state()	WP04	P1	Yes
T019	Implement load_checkpoint()	WP04	P1	No
T020	Implement cleanup_test_context()	WP04	P1	No
T021	Create minimal test feature structure	WP05	P1	No
T022	Create checkpoint_wp_created fixture	WP05	P1	Yes
T023	Create checkpoint_wp_implemented fixture	WP05	P1	Yes
T024	Create checkpoint_review_pending fixture	WP05	P1	Yes
T025	Add fixture manifest	WP05	P1	No
T026	Create orchestrator conftest.py	WP06	P1	Yes
T027	Register custom markers	WP06	P1	Yes
T028	Implement available_agents fixture	WP06	P1	No
T029	Implement test_path fixture	WP06	P1	No
T030	Implement test_context fixture	WP06	P1	No
T031	Test single WP orchestration	WP07	P1	Yes
T032	Test multiple parallel WPs	WP07	P1	Yes
T033	Test state validation	WP07	P1	Yes
T034	Test lane status consistency	WP07	P1	Yes
T035	Test commit verification	WP07	P1	Yes
T036	Test review rejection flow	WP08	P2	Yes
T037	Test re-implementation	WP08	P2	Yes
T038	Test full review cycle	WP08	P2	Yes
T039	Test max review cycles	WP08	P2	Yes
T040	Test state transition history	WP08	P2	Yes
T041	Test independent WPs parallel	WP09	P2	Yes
T042	Test dependency blocking	WP09	P2	Yes
T043	Test circular dependency detection	WP09	P2	Yes
T044	Test diamond pattern	WP09	P2	Yes
T045	Test linear chain	WP09	P2	Yes
T046	Test fan-out pattern	WP09	P2	Yes
T047	Smoke test infrastructure	WP10	P3	No
T048	File touch verification	WP10	P3	No
T049	Parametrized extended agent tests	WP10	P3	Yes
T050	Skip behavior for unavailable	WP10	P3	Yes
T051	Timing validation	WP10	P3	Yes
T052	Create checkpoint_review_rejected	WP11	P2	Yes
T053	Create checkpoint_review_approved	WP11	P2	Yes
T054	Create checkpoint_wp_merged	WP11	P2	Yes
T055	Add stale checkpoint detection	WP11	P2	No
T056	Update root conftest.py	WP12	P3	Yes
T057	Add state validation utilities	WP12	P3	Yes
T058	Add test output helpers	WP12	P3	Yes
T059	Add timeout configuration	WP12	P3	Yes
T060	Validate quickstart scenarios	WP12	P3	No

Canonical Status (Generated)

WP01: done
WP02: done
WP03: done
WP04: done
WP05: done
WP06: done
WP07: done
WP08: done
WP09: done
WP10: done
WP11: done
WP12: done

Spec Kitty

Work Packages: Orchestrator End-to-End Testing Suite

Subtask Format: [Txxx] [P?] Description

Work Package WP01: Test Utilities Foundation (Priority: P0)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP02: Test Path Selection (Priority: P0)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP03: Fixture Data Structures (Priority: P0)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP04: Fixture Loader (Priority: P1)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP05: Initial Checkpoint Fixtures (Priority: P1)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP06: pytest Configuration (Priority: P1)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP07: Happy Path Tests (Priority: P1) 🎯 MVP

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP08: Review Cycle Tests (Priority: P2)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP09: Parallel and Dependency Tests (Priority: P2)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP10: Extended Agent Smoke Tests (Priority: P3)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP11: Additional Checkpoint Fixtures (Priority: P2)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Work Package WP12: Integration and Polish (Priority: P3)

Included Subtasks

Implementation Notes

Parallel Opportunities

Dependencies

Risks & Mitigations

Dependency & Execution Summary

Subtask Index (Reference)

Canonical Status (Generated)

Subtask Format: `[Txxx] [P?] Description`