Work Packages: Orchestrator End-to-End Testing Suite
Inputs: Design documents from /kitty-specs/021-orchestrator-end-to-end-testing-suite/ Prerequisites: plan.md (required), spec.md (user stories), data-model.md
Tests: This feature IS a testing infrastructure - all test files are production code.
Organization: Fine-grained subtasks (Txxx) roll up into work packages (WPxx). Each work package must be independently deliverable and testable.
Prompt Files: Each work package references a matching prompt file in /tasks/ generated by /spec-kitty.tasks.
Subtask Format: [Txxx] [P?] Description
- [P] indicates the subtask can proceed in parallel (different files/components).
- Include precise file paths or modules.
Work Package WP01: Test Utilities Foundation (Priority: P0)
Goal: Create the core test utilities module with agent availability detection. Independent Test: AgentAvailability dataclass can detect and probe agents. Prompt: tasks/WP01-test-utilities-foundation.md Estimated Size: ~350 lines
Included Subtasks
- ✅ T001 Create
src/specify_cli/orchestrator/testing/__init__.pymodule structure - ✅ T002 Implement
AgentAvailabilitydataclass inavailability.py - ✅ T003 Implement
is_installedcheck using existing agent invoker registry - ✅ T004 Implement
probe()method for auth verification (lightweight API call) - ✅ T005 Implement
detect_all_agents()function with tier categorization
Implementation Notes
- Reuse existing agent invoker detection from
src/specify_cli/orchestrator/agents/__init__.py - Each agent's
probe()should timeout at 10 seconds - Cache detection results at module level for session duration
Parallel Opportunities
- T002 (dataclass) can be written first; T003-T005 can then proceed in parallel
Dependencies
- None (foundation package)
Risks & Mitigations
- Agent probes might be slow → use 10s timeout per agent
- Some agents may not have probe capability → graceful fallback to install-only check
Work Package WP02: Test Path Selection (Priority: P0)
Goal: Implement the test path model that selects 1-agent, 2-agent, or 3+-agent paths based on availability. Independent Test: TestPath correctly assigns agents based on availability count. Prompt: tasks/WP02-test-path-selection.md Estimated Size: ~280 lines
Included Subtasks
- ✅ T006 Implement
TestPathdataclass inpaths.py - ✅ T007 Implement path selection logic based on agent count
- ✅ T008 Implement agent assignment (implementation, review, fallback)
- ✅ T009 Add
select_test_path()function with caching
Implementation Notes
- Path selection: 1 agent → 1-agent, 2 agents → 2-agent, 3+ agents → 3+-agent
- Assignment: First agent for impl, second for review, third for fallback
- Cache the selected path per test session
Parallel Opportunities
- T006 must complete first; T007-T009 can then proceed together
Dependencies
- Depends on WP01 (needs
detect_all_agents())
Risks & Mitigations
- Agent order might vary → use consistent sorting by agent_id
Work Package WP03: Fixture Data Structures (Priority: P0)
Goal: Define all data structures for fixture management. Independent Test: All dataclasses serialize/deserialize correctly to/from JSON. Prompt: tasks/WP03-fixture-data-structures.md Estimated Size: ~320 lines
Included Subtasks
- ✅ T010 Implement
FixtureCheckpointdataclass infixtures.py - ✅ T011 Implement
WorktreeMetadatadataclass - ✅ T012 Implement
TestContextdataclass - ✅ T013 Add
worktrees.jsonschema validation with Pydantic or manual checks - ✅ T014 Add
state.jsonschema validation (OrchestrationRun compatibility)
Implementation Notes
- Match data-model.md specifications exactly
- Use dataclasses with optional Pydantic validation
- Ensure all paths use
pathlib.Path
Parallel Opportunities
- T010-T012 (dataclasses) can proceed in parallel
- T013-T014 (validation) depend on their respective dataclasses
Dependencies
- None (can proceed in parallel with WP01-WP02)
Risks & Mitigations
- Schema drift with OrchestrationRun → import from orchestrator.state directly
Work Package WP04: Fixture Loader (Priority: P1)
Goal: Implement fixture loading that restores checkpoints to usable test state. Independent Test: Loading a checkpoint creates valid TestContext with git repo. Prompt: tasks/WP04-fixture-loader.md Estimated Size: ~450 lines
Included Subtasks
- ✅ T015 Implement
copy_fixture_to_temp()- copies checkpoint to temp directory - ✅ T016 Implement
init_git_repo()- initializes git in temp directory - ✅ T017 Implement
create_worktrees_from_metadata()- recreates worktrees - ✅ T018 Implement
load_orchestration_state()- deserializes state.json - ✅ T019 Implement
load_checkpoint()- assembles full TestContext - ✅ T020 Implement
cleanup_test_context()- removes temp directories
Implementation Notes
- Use
tempfile.mkdtemp()for isolation - Git init must create main branch and initial commit
- Worktree creation uses
git worktree add - State loading uses existing
OrchestrationRun.from_json()
Parallel Opportunities
- T015-T018 can proceed in parallel (independent utilities)
- T019 depends on all of the above
Dependencies
- Depends on WP03 (needs dataclasses)
Risks & Mitigations
- Temp directory cleanup failure → use atexit handlers and try/finally
- Git worktree issues → validate git version compatibility
Work Package WP05: Initial Checkpoint Fixtures (Priority: P1)
Goal: Create the first set of checkpoint fixtures for testing. Independent Test: Fixtures exist and can be loaded successfully. Prompt: tasks/WP05-initial-checkpoint-fixtures.md Estimated Size: ~400 lines
Included Subtasks
- ✅ T021 Create minimal test feature structure (spec.md, plan.md, meta.json, tasks/)
- ✅ T022 Create
checkpoint_wp_created/fixture (WPs in planned lane) - ✅ T023 Create
checkpoint_wp_implemented/fixture (WP01 implemented) - ✅ T024 Create
checkpoint_review_pending/fixture (WP01 in review) - ✅ T025 Add fixture manifest in
__init__.pyfor discovery
Implementation Notes
- Directory structure:
tests/fixtures/orchestrator/checkpoint_<name>/ - Each checkpoint needs: state.json, feature/, worktrees.json
- Minimal feature: 2 WPs (WP01, WP02) with simple tasks
Parallel Opportunities
- T021 must complete first; T022-T024 can then proceed in parallel
Dependencies
- Depends on WP03 (needs data structure definitions)
Risks & Mitigations
- Fixture size → keep features minimal (2 WPs, no actual code)
Work Package WP06: pytest Configuration (Priority: P1)
Goal: Set up pytest fixtures and markers for orchestrator tests. Independent Test: pytest collects tests with correct markers and fixtures inject properly. Prompt: tasks/WP06-pytest-configuration.md Estimated Size: ~380 lines
Included Subtasks
- ✅ T026 Create
tests/specify_cli/orchestrator/conftest.py - ✅ T027 Register custom markers in pytest.ini or conftest
- ✅ T028 Implement
available_agentsfixture (session-scoped) - ✅ T029 Implement
test_pathfixture (derives from available_agents) - ✅ T030 Implement
test_contextfixture (function-scoped, loads checkpoint)
Implementation Notes
- Markers:
orchestrator_availability,orchestrator_fixtures,orchestrator_happy_path,orchestrator_review_cycles,orchestrator_parallel,orchestrator_smoke,core_agent,extended_agent - Session-scoped fixtures cache agent detection
- Function-scoped fixtures provide isolated test contexts
Parallel Opportunities
- T026-T027 can proceed together; T028-T030 depend on them
Dependencies
- Depends on WP01 (availability detection), WP02 (path selection), WP04 (fixture loading)
Risks & Mitigations
- Marker conflicts → use unique prefix
orchestrator_
Work Package WP07: Happy Path Tests (Priority: P1) 🎯 MVP
Goal: Implement end-to-end tests for happy path orchestration. Independent Test: Single WP orchestration completes with correct state. Prompt: tasks/WP07-happy-path-tests.md Estimated Size: ~450 lines
Included Subtasks
- ✅ T031 Implement test: single WP orchestration end-to-end
- ✅ T032 Implement test: multiple parallel WPs orchestration
- ✅ T033 Implement test: state validation after orchestration
- ✅ T034 Implement test: lane status consistency (frontmatter matches state)
- ✅ T035 Implement test: commit verification in worktrees
Implementation Notes
- Use real agents (no mocks)
- Start from
checkpoint_wp_createdfixture - Validate OrchestrationRun state file after completion
- Verify git commits exist in worktree branches
Parallel Opportunities
- All tests can be developed in parallel once fixtures are ready
Dependencies
- Depends on WP05 (fixtures), WP06 (pytest config)
Risks & Mitigations
- Long test times → mark with
@pytest.mark.slow - Agent unavailability → proper skip/fail based on tier
Work Package WP08: Review Cycle Tests (Priority: P2)
Goal: Test review rejection and re-implementation cycles. Independent Test: WP correctly cycles through rejection and re-implementation. Prompt: tasks/WP08-review-cycle-tests.md Estimated Size: ~420 lines
Included Subtasks
- ✅ T036 Implement test: review rejection triggers re-implementation
- ✅ T037 Implement test: re-implementation produces new commits
- ✅ T038 Implement test: full cycle (reject → re-impl → re-review → approve)
- ✅ T039 Implement test: max review cycles exceeded marks WP failed
- ✅ T040 Implement test: state transition history is recorded
Implementation Notes
- Start from
checkpoint_review_pendingfixture - Need way to trigger review rejection (agent prompt or fixture state)
- Verify state file contains complete transition history
Parallel Opportunities
- T036-T040 can proceed in parallel
Dependencies
- Depends on WP05 (fixtures), WP06 (pytest config)
Risks & Mitigations
- Review cycle depends on agent behavior → use fixture at review_rejected state
Work Package WP09: Parallel and Dependency Tests (Priority: P2)
Goal: Test parallel execution and dependency ordering. Independent Test: Orchestrator respects dependency graph. Prompt: tasks/WP09-parallel-dependency-tests.md Estimated Size: ~480 lines
Included Subtasks
- ✅ T041 Implement test: independent WPs start simultaneously
- ✅ T042 Implement test: WP waits for dependencies before starting
- ✅ T043 Implement test: circular dependency detection (pre-execution fail)
- ✅ T044 Implement test: diamond dependency pattern execution order
- ✅ T045 Implement test: linear chain (WP01 → WP02 → WP03)
- ✅ T046 Implement test: fan-out pattern (WP01 → WP02, WP03, WP04)
Implementation Notes
- Need fixtures with specific dependency patterns
- Verify start times to confirm parallelization
- Circular dependency should fail at validation, not execution
Parallel Opportunities
- All tests can be developed in parallel
Dependencies
- Depends on WP05 (fixtures), WP06 (pytest config)
Risks & Mitigations
- Timing verification flaky → use timestamps from state, not wall clock
Work Package WP10: Extended Agent Smoke Tests (Priority: P3)
Goal: Basic validation that extended agents can be invoked. Independent Test: Each available extended agent creates a file. Prompt: tasks/WP10-extended-agent-smoke-tests.md Estimated Size: ~380 lines
Included Subtasks
- ✅ T047 Implement smoke test infrastructure (base class or helper)
- ✅ T048 Implement file touch task and
verify_smoke_result()function - ✅ T049 Create parametrized smoke tests for all 7 extended agents
- ✅ T050 Implement skip behavior for unavailable extended agents
- ✅ T051 Add timing validation (each test completes in <60s)
Implementation Notes
- Use
@pytest.mark.parametrizewith agent list - Skip gracefully with
pytest.skip()when agent unavailable - Verify file creation, not file contents
Parallel Opportunities
- T047-T048 must complete first; T049-T051 then in parallel
Dependencies
- Depends on WP01 (availability detection), WP06 (pytest config)
Risks & Mitigations
- Extended agents may have different invocation patterns → use existing invokers
Work Package WP11: Additional Checkpoint Fixtures (Priority: P2)
Goal: Complete the remaining checkpoint fixtures. Independent Test: All checkpoints load successfully and represent correct state. Prompt: tasks/WP11-additional-checkpoint-fixtures.md Estimated Size: ~300 lines
Included Subtasks
- ✅ T052 Create
checkpoint_review_rejected/fixture - ✅ T053 Create
checkpoint_review_approved/fixture - ✅ T054 Create
checkpoint_wp_merged/fixture - ✅ T055 Add stale checkpoint detection (version mismatch warning)
Implementation Notes
- Follow same structure as WP05 checkpoints
review_rejectedneeds state with rejection historywp_mergedrepresents post-merge state on main
Parallel Opportunities
- T052-T054 can proceed in parallel
Dependencies
- Depends on WP03 (dataclasses), WP05 (initial fixtures for reference)
Risks & Mitigations
- Fixture maintenance → add version field for staleness check
Work Package WP12: Integration and Polish (Priority: P3)
Goal: Final integration, validation utilities, and polish. Independent Test: Full test suite runs and quickstart scenarios pass. Prompt: tasks/WP12-integration-polish.md Estimated Size: ~320 lines
Included Subtasks
- ✅ T056 Update root
tests/conftest.pywith orchestrator marker registration - ✅ T057 Add state validation utilities (
validate_test_result()) - ✅ T058 Add test output helpers (distinguish skip vs fail messages)
- ✅ T059 Add timeout configuration via environment variables
- ✅ T060 Validate quickstart.md scenarios work as documented
Implementation Notes
- Markers must be registered in root conftest for pytest collection
- State validation checks: file integrity, lane consistency, git state
- Timeout defaults: 300s per test, 10s for probe
Parallel Opportunities
- T056-T059 can proceed in parallel; T060 is final validation
Dependencies
- Depends on all previous WPs (final polish)
Risks & Mitigations
- Integration issues → run full suite after each WP merge
Dependency & Execution Summary
Phase 0 (Foundation):
WP01 (Test Utilities) ──┐
WP02 (Test Paths) ──────┼──→ WP06 (pytest Config)
WP03 (Data Structures) ─┴──→ WP04 (Fixture Loader) ──→ WP05 (Initial Fixtures)
Phase 1 (Core Tests):
WP06 + WP05 ──→ WP07 (Happy Path) 🎯 MVP
├──→ WP08 (Review Cycles)
└──→ WP09 (Parallel/Deps)
Phase 2 (Extended):
WP01 + WP06 ──→ WP10 (Smoke Tests)
WP03 + WP05 ──→ WP11 (Additional Fixtures)
Phase 3 (Polish):
All ──→ WP12 (Integration)
Parallelization Opportunities:
- WP01, WP02, WP03 can proceed simultaneously (no dependencies)
- WP07, WP08, WP09 can proceed in parallel once fixtures ready
- WP10, WP11 can proceed in parallel with Phase 1 tests
MVP Scope: WP01 → WP03 → WP04 → WP05 → WP06 → WP07 (Happy Path Tests)
Subtask Index (Reference)
| Subtask ID | Summary | Work Package | Priority | Parallel? |
|---|---|---|---|---|
| T001 | Create testing module structure | WP01 | P0 | No |
| T002 | Implement AgentAvailability dataclass | WP01 | P0 | No |
| T003 | Implement is_installed check | WP01 | P0 | Yes |
| T004 | Implement probe() auth verification | WP01 | P0 | Yes |
| T005 | Implement detect_all_agents() | WP01 | P0 | No |
| T006 | Implement TestPath dataclass | WP02 | P0 | No |
| T007 | Implement path selection logic | WP02 | P0 | Yes |
| T008 | Implement agent assignment | WP02 | P0 | Yes |
| T009 | Add select_test_path() with caching | WP02 | P0 | Yes |
| T010 | Implement FixtureCheckpoint dataclass | WP03 | P0 | Yes |
| T011 | Implement WorktreeMetadata dataclass | WP03 | P0 | Yes |
| T012 | Implement TestContext dataclass | WP03 | P0 | Yes |
| T013 | Add worktrees.json validation | WP03 | P0 | No |
| T014 | Add state.json validation | WP03 | P0 | No |
| T015 | Implement copy_fixture_to_temp() | WP04 | P1 | Yes |
| T016 | Implement init_git_repo() | WP04 | P1 | Yes |
| T017 | Implement create_worktrees_from_metadata() | WP04 | P1 | Yes |
| T018 | Implement load_orchestration_state() | WP04 | P1 | Yes |
| T019 | Implement load_checkpoint() | WP04 | P1 | No |
| T020 | Implement cleanup_test_context() | WP04 | P1 | No |
| T021 | Create minimal test feature structure | WP05 | P1 | No |
| T022 | Create checkpoint_wp_created fixture | WP05 | P1 | Yes |
| T023 | Create checkpoint_wp_implemented fixture | WP05 | P1 | Yes |
| T024 | Create checkpoint_review_pending fixture | WP05 | P1 | Yes |
| T025 | Add fixture manifest | WP05 | P1 | No |
| T026 | Create orchestrator conftest.py | WP06 | P1 | Yes |
| T027 | Register custom markers | WP06 | P1 | Yes |
| T028 | Implement available_agents fixture | WP06 | P1 | No |
| T029 | Implement test_path fixture | WP06 | P1 | No |
| T030 | Implement test_context fixture | WP06 | P1 | No |
| T031 | Test single WP orchestration | WP07 | P1 | Yes |
| T032 | Test multiple parallel WPs | WP07 | P1 | Yes |
| T033 | Test state validation | WP07 | P1 | Yes |
| T034 | Test lane status consistency | WP07 | P1 | Yes |
| T035 | Test commit verification | WP07 | P1 | Yes |
| T036 | Test review rejection flow | WP08 | P2 | Yes |
| T037 | Test re-implementation | WP08 | P2 | Yes |
| T038 | Test full review cycle | WP08 | P2 | Yes |
| T039 | Test max review cycles | WP08 | P2 | Yes |
| T040 | Test state transition history | WP08 | P2 | Yes |
| T041 | Test independent WPs parallel | WP09 | P2 | Yes |
| T042 | Test dependency blocking | WP09 | P2 | Yes |
| T043 | Test circular dependency detection | WP09 | P2 | Yes |
| T044 | Test diamond pattern | WP09 | P2 | Yes |
| T045 | Test linear chain | WP09 | P2 | Yes |
| T046 | Test fan-out pattern | WP09 | P2 | Yes |
| T047 | Smoke test infrastructure | WP10 | P3 | No |
| T048 | File touch verification | WP10 | P3 | No |
| T049 | Parametrized extended agent tests | WP10 | P3 | Yes |
| T050 | Skip behavior for unavailable | WP10 | P3 | Yes |
| T051 | Timing validation | WP10 | P3 | Yes |
| T052 | Create checkpoint_review_rejected | WP11 | P2 | Yes |
| T053 | Create checkpoint_review_approved | WP11 | P2 | Yes |
| T054 | Create checkpoint_wp_merged | WP11 | P2 | Yes |
| T055 | Add stale checkpoint detection | WP11 | P2 | No |
| T056 | Update root conftest.py | WP12 | P3 | Yes |
| T057 | Add state validation utilities | WP12 | P3 | Yes |
| T058 | Add test output helpers | WP12 | P3 | Yes |
| T059 | Add timeout configuration | WP12 | P3 | Yes |
| T060 | Validate quickstart scenarios | WP12 | P3 | No |
<!-- status-model:start -->
Canonical Status (Generated)
<!-- status-model:end -->
- WP01: done
- WP02: done
- WP03: done
- WP04: done
- WP05: done
- WP06: done
- WP07: done
- WP08: done
- WP09: done
- WP10: done
- WP11: done
- WP12: done