Work Packages: Orchestrator End-to-End Testing Suite

Inputs: Design documents from /kitty-specs/021-orchestrator-end-to-end-testing-suite/ Prerequisites: plan.md (required), spec.md (user stories), data-model.md

Tests: This feature IS a testing infrastructure - all test files are production code.

Organization: Fine-grained subtasks (Txxx) roll up into work packages (WPxx). Each work package must be independently deliverable and testable.

Prompt Files: Each work package references a matching prompt file in /tasks/ generated by /spec-kitty.tasks.

Subtask Format: [Txxx] [P?] Description

  • [P] indicates the subtask can proceed in parallel (different files/components).
  • Include precise file paths or modules.

Work Package WP01: Test Utilities Foundation (Priority: P0)

Goal: Create the core test utilities module with agent availability detection. Independent Test: AgentAvailability dataclass can detect and probe agents. Prompt: tasks/WP01-test-utilities-foundation.md Estimated Size: ~350 lines

Included Subtasks

  • ✅ T001 Create src/specify_cli/orchestrator/testing/__init__.py module structure
  • ✅ T002 Implement AgentAvailability dataclass in availability.py
  • ✅ T003 Implement is_installed check using existing agent invoker registry
  • ✅ T004 Implement probe() method for auth verification (lightweight API call)
  • ✅ T005 Implement detect_all_agents() function with tier categorization

Implementation Notes

  • Reuse existing agent invoker detection from src/specify_cli/orchestrator/agents/__init__.py
  • Each agent's probe() should timeout at 10 seconds
  • Cache detection results at module level for session duration

Parallel Opportunities

  • T002 (dataclass) can be written first; T003-T005 can then proceed in parallel

Dependencies

  • None (foundation package)

Risks & Mitigations

  • Agent probes might be slow → use 10s timeout per agent
  • Some agents may not have probe capability → graceful fallback to install-only check

Work Package WP02: Test Path Selection (Priority: P0)

Goal: Implement the test path model that selects 1-agent, 2-agent, or 3+-agent paths based on availability. Independent Test: TestPath correctly assigns agents based on availability count. Prompt: tasks/WP02-test-path-selection.md Estimated Size: ~280 lines

Included Subtasks

  • ✅ T006 Implement TestPath dataclass in paths.py
  • ✅ T007 Implement path selection logic based on agent count
  • ✅ T008 Implement agent assignment (implementation, review, fallback)
  • ✅ T009 Add select_test_path() function with caching

Implementation Notes

  • Path selection: 1 agent → 1-agent, 2 agents → 2-agent, 3+ agents → 3+-agent
  • Assignment: First agent for impl, second for review, third for fallback
  • Cache the selected path per test session

Parallel Opportunities

  • T006 must complete first; T007-T009 can then proceed together

Dependencies

  • Depends on WP01 (needs detect_all_agents())

Risks & Mitigations

  • Agent order might vary → use consistent sorting by agent_id

Work Package WP03: Fixture Data Structures (Priority: P0)

Goal: Define all data structures for fixture management. Independent Test: All dataclasses serialize/deserialize correctly to/from JSON. Prompt: tasks/WP03-fixture-data-structures.md Estimated Size: ~320 lines

Included Subtasks

  • ✅ T010 Implement FixtureCheckpoint dataclass in fixtures.py
  • ✅ T011 Implement WorktreeMetadata dataclass
  • ✅ T012 Implement TestContext dataclass
  • ✅ T013 Add worktrees.json schema validation with Pydantic or manual checks
  • ✅ T014 Add state.json schema validation (OrchestrationRun compatibility)

Implementation Notes

  • Match data-model.md specifications exactly
  • Use dataclasses with optional Pydantic validation
  • Ensure all paths use pathlib.Path

Parallel Opportunities

  • T010-T012 (dataclasses) can proceed in parallel
  • T013-T014 (validation) depend on their respective dataclasses

Dependencies

  • None (can proceed in parallel with WP01-WP02)

Risks & Mitigations

  • Schema drift with OrchestrationRun → import from orchestrator.state directly

Work Package WP04: Fixture Loader (Priority: P1)

Goal: Implement fixture loading that restores checkpoints to usable test state. Independent Test: Loading a checkpoint creates valid TestContext with git repo. Prompt: tasks/WP04-fixture-loader.md Estimated Size: ~450 lines

Included Subtasks

  • ✅ T015 Implement copy_fixture_to_temp() - copies checkpoint to temp directory
  • ✅ T016 Implement init_git_repo() - initializes git in temp directory
  • ✅ T017 Implement create_worktrees_from_metadata() - recreates worktrees
  • ✅ T018 Implement load_orchestration_state() - deserializes state.json
  • ✅ T019 Implement load_checkpoint() - assembles full TestContext
  • ✅ T020 Implement cleanup_test_context() - removes temp directories

Implementation Notes

  • Use tempfile.mkdtemp() for isolation
  • Git init must create main branch and initial commit
  • Worktree creation uses git worktree add
  • State loading uses existing OrchestrationRun.from_json()

Parallel Opportunities

  • T015-T018 can proceed in parallel (independent utilities)
  • T019 depends on all of the above

Dependencies

  • Depends on WP03 (needs dataclasses)

Risks & Mitigations

  • Temp directory cleanup failure → use atexit handlers and try/finally
  • Git worktree issues → validate git version compatibility

Work Package WP05: Initial Checkpoint Fixtures (Priority: P1)

Goal: Create the first set of checkpoint fixtures for testing. Independent Test: Fixtures exist and can be loaded successfully. Prompt: tasks/WP05-initial-checkpoint-fixtures.md Estimated Size: ~400 lines

Included Subtasks

  • ✅ T021 Create minimal test feature structure (spec.md, plan.md, meta.json, tasks/)
  • ✅ T022 Create checkpoint_wp_created/ fixture (WPs in planned lane)
  • ✅ T023 Create checkpoint_wp_implemented/ fixture (WP01 implemented)
  • ✅ T024 Create checkpoint_review_pending/ fixture (WP01 in review)
  • ✅ T025 Add fixture manifest in __init__.py for discovery

Implementation Notes

  • Directory structure: tests/fixtures/orchestrator/checkpoint_<name>/
  • Each checkpoint needs: state.json, feature/, worktrees.json
  • Minimal feature: 2 WPs (WP01, WP02) with simple tasks

Parallel Opportunities

  • T021 must complete first; T022-T024 can then proceed in parallel

Dependencies

  • Depends on WP03 (needs data structure definitions)

Risks & Mitigations

  • Fixture size → keep features minimal (2 WPs, no actual code)

Work Package WP06: pytest Configuration (Priority: P1)

Goal: Set up pytest fixtures and markers for orchestrator tests. Independent Test: pytest collects tests with correct markers and fixtures inject properly. Prompt: tasks/WP06-pytest-configuration.md Estimated Size: ~380 lines

Included Subtasks

  • ✅ T026 Create tests/specify_cli/orchestrator/conftest.py
  • ✅ T027 Register custom markers in pytest.ini or conftest
  • ✅ T028 Implement available_agents fixture (session-scoped)
  • ✅ T029 Implement test_path fixture (derives from available_agents)
  • ✅ T030 Implement test_context fixture (function-scoped, loads checkpoint)

Implementation Notes

  • Markers: orchestrator_availability, orchestrator_fixtures, orchestrator_happy_path, orchestrator_review_cycles, orchestrator_parallel, orchestrator_smoke, core_agent, extended_agent
  • Session-scoped fixtures cache agent detection
  • Function-scoped fixtures provide isolated test contexts

Parallel Opportunities

  • T026-T027 can proceed together; T028-T030 depend on them

Dependencies

  • Depends on WP01 (availability detection), WP02 (path selection), WP04 (fixture loading)

Risks & Mitigations

  • Marker conflicts → use unique prefix orchestrator_

Work Package WP07: Happy Path Tests (Priority: P1) 🎯 MVP

Goal: Implement end-to-end tests for happy path orchestration. Independent Test: Single WP orchestration completes with correct state. Prompt: tasks/WP07-happy-path-tests.md Estimated Size: ~450 lines

Included Subtasks

  • ✅ T031 Implement test: single WP orchestration end-to-end
  • ✅ T032 Implement test: multiple parallel WPs orchestration
  • ✅ T033 Implement test: state validation after orchestration
  • ✅ T034 Implement test: lane status consistency (frontmatter matches state)
  • ✅ T035 Implement test: commit verification in worktrees

Implementation Notes

  • Use real agents (no mocks)
  • Start from checkpoint_wp_created fixture
  • Validate OrchestrationRun state file after completion
  • Verify git commits exist in worktree branches

Parallel Opportunities

  • All tests can be developed in parallel once fixtures are ready

Dependencies

  • Depends on WP05 (fixtures), WP06 (pytest config)

Risks & Mitigations

  • Long test times → mark with @pytest.mark.slow
  • Agent unavailability → proper skip/fail based on tier

Work Package WP08: Review Cycle Tests (Priority: P2)

Goal: Test review rejection and re-implementation cycles. Independent Test: WP correctly cycles through rejection and re-implementation. Prompt: tasks/WP08-review-cycle-tests.md Estimated Size: ~420 lines

Included Subtasks

  • ✅ T036 Implement test: review rejection triggers re-implementation
  • ✅ T037 Implement test: re-implementation produces new commits
  • ✅ T038 Implement test: full cycle (reject → re-impl → re-review → approve)
  • ✅ T039 Implement test: max review cycles exceeded marks WP failed
  • ✅ T040 Implement test: state transition history is recorded

Implementation Notes

  • Start from checkpoint_review_pending fixture
  • Need way to trigger review rejection (agent prompt or fixture state)
  • Verify state file contains complete transition history

Parallel Opportunities

  • T036-T040 can proceed in parallel

Dependencies

  • Depends on WP05 (fixtures), WP06 (pytest config)

Risks & Mitigations

  • Review cycle depends on agent behavior → use fixture at review_rejected state

Work Package WP09: Parallel and Dependency Tests (Priority: P2)

Goal: Test parallel execution and dependency ordering. Independent Test: Orchestrator respects dependency graph. Prompt: tasks/WP09-parallel-dependency-tests.md Estimated Size: ~480 lines

Included Subtasks

  • ✅ T041 Implement test: independent WPs start simultaneously
  • ✅ T042 Implement test: WP waits for dependencies before starting
  • ✅ T043 Implement test: circular dependency detection (pre-execution fail)
  • ✅ T044 Implement test: diamond dependency pattern execution order
  • ✅ T045 Implement test: linear chain (WP01 → WP02 → WP03)
  • ✅ T046 Implement test: fan-out pattern (WP01 → WP02, WP03, WP04)

Implementation Notes

  • Need fixtures with specific dependency patterns
  • Verify start times to confirm parallelization
  • Circular dependency should fail at validation, not execution

Parallel Opportunities

  • All tests can be developed in parallel

Dependencies

  • Depends on WP05 (fixtures), WP06 (pytest config)

Risks & Mitigations

  • Timing verification flaky → use timestamps from state, not wall clock

Work Package WP10: Extended Agent Smoke Tests (Priority: P3)

Goal: Basic validation that extended agents can be invoked. Independent Test: Each available extended agent creates a file. Prompt: tasks/WP10-extended-agent-smoke-tests.md Estimated Size: ~380 lines

Included Subtasks

  • ✅ T047 Implement smoke test infrastructure (base class or helper)
  • ✅ T048 Implement file touch task and verify_smoke_result() function
  • ✅ T049 Create parametrized smoke tests for all 7 extended agents
  • ✅ T050 Implement skip behavior for unavailable extended agents
  • ✅ T051 Add timing validation (each test completes in <60s)

Implementation Notes

  • Use @pytest.mark.parametrize with agent list
  • Skip gracefully with pytest.skip() when agent unavailable
  • Verify file creation, not file contents

Parallel Opportunities

  • T047-T048 must complete first; T049-T051 then in parallel

Dependencies

  • Depends on WP01 (availability detection), WP06 (pytest config)

Risks & Mitigations

  • Extended agents may have different invocation patterns → use existing invokers

Work Package WP11: Additional Checkpoint Fixtures (Priority: P2)

Goal: Complete the remaining checkpoint fixtures. Independent Test: All checkpoints load successfully and represent correct state. Prompt: tasks/WP11-additional-checkpoint-fixtures.md Estimated Size: ~300 lines

Included Subtasks

  • ✅ T052 Create checkpoint_review_rejected/ fixture
  • ✅ T053 Create checkpoint_review_approved/ fixture
  • ✅ T054 Create checkpoint_wp_merged/ fixture
  • ✅ T055 Add stale checkpoint detection (version mismatch warning)

Implementation Notes

  • Follow same structure as WP05 checkpoints
  • review_rejected needs state with rejection history
  • wp_merged represents post-merge state on main

Parallel Opportunities

  • T052-T054 can proceed in parallel

Dependencies

  • Depends on WP03 (dataclasses), WP05 (initial fixtures for reference)

Risks & Mitigations

  • Fixture maintenance → add version field for staleness check

Work Package WP12: Integration and Polish (Priority: P3)

Goal: Final integration, validation utilities, and polish. Independent Test: Full test suite runs and quickstart scenarios pass. Prompt: tasks/WP12-integration-polish.md Estimated Size: ~320 lines

Included Subtasks

  • ✅ T056 Update root tests/conftest.py with orchestrator marker registration
  • ✅ T057 Add state validation utilities (validate_test_result())
  • ✅ T058 Add test output helpers (distinguish skip vs fail messages)
  • ✅ T059 Add timeout configuration via environment variables
  • ✅ T060 Validate quickstart.md scenarios work as documented

Implementation Notes

  • Markers must be registered in root conftest for pytest collection
  • State validation checks: file integrity, lane consistency, git state
  • Timeout defaults: 300s per test, 10s for probe

Parallel Opportunities

  • T056-T059 can proceed in parallel; T060 is final validation

Dependencies

  • Depends on all previous WPs (final polish)

Risks & Mitigations

  • Integration issues → run full suite after each WP merge

Dependency & Execution Summary

Phase 0 (Foundation):
  WP01 (Test Utilities) ──┐
  WP02 (Test Paths) ──────┼──→ WP06 (pytest Config)
  WP03 (Data Structures) ─┴──→ WP04 (Fixture Loader) ──→ WP05 (Initial Fixtures)

Phase 1 (Core Tests):
  WP06 + WP05 ──→ WP07 (Happy Path) 🎯 MVP
            ├──→ WP08 (Review Cycles)
            └──→ WP09 (Parallel/Deps)

Phase 2 (Extended):
  WP01 + WP06 ──→ WP10 (Smoke Tests)
  WP03 + WP05 ──→ WP11 (Additional Fixtures)

Phase 3 (Polish):
  All ──→ WP12 (Integration)

Parallelization Opportunities:

  • WP01, WP02, WP03 can proceed simultaneously (no dependencies)
  • WP07, WP08, WP09 can proceed in parallel once fixtures ready
  • WP10, WP11 can proceed in parallel with Phase 1 tests

MVP Scope: WP01 → WP03 → WP04 → WP05 → WP06 → WP07 (Happy Path Tests)


Subtask Index (Reference)

Subtask IDSummaryWork PackagePriorityParallel?
T001Create testing module structureWP01P0No
T002Implement AgentAvailability dataclassWP01P0No
T003Implement is_installed checkWP01P0Yes
T004Implement probe() auth verificationWP01P0Yes
T005Implement detect_all_agents()WP01P0No
T006Implement TestPath dataclassWP02P0No
T007Implement path selection logicWP02P0Yes
T008Implement agent assignmentWP02P0Yes
T009Add select_test_path() with cachingWP02P0Yes
T010Implement FixtureCheckpoint dataclassWP03P0Yes
T011Implement WorktreeMetadata dataclassWP03P0Yes
T012Implement TestContext dataclassWP03P0Yes
T013Add worktrees.json validationWP03P0No
T014Add state.json validationWP03P0No
T015Implement copy_fixture_to_temp()WP04P1Yes
T016Implement init_git_repo()WP04P1Yes
T017Implement create_worktrees_from_metadata()WP04P1Yes
T018Implement load_orchestration_state()WP04P1Yes
T019Implement load_checkpoint()WP04P1No
T020Implement cleanup_test_context()WP04P1No
T021Create minimal test feature structureWP05P1No
T022Create checkpoint_wp_created fixtureWP05P1Yes
T023Create checkpoint_wp_implemented fixtureWP05P1Yes
T024Create checkpoint_review_pending fixtureWP05P1Yes
T025Add fixture manifestWP05P1No
T026Create orchestrator conftest.pyWP06P1Yes
T027Register custom markersWP06P1Yes
T028Implement available_agents fixtureWP06P1No
T029Implement test_path fixtureWP06P1No
T030Implement test_context fixtureWP06P1No
T031Test single WP orchestrationWP07P1Yes
T032Test multiple parallel WPsWP07P1Yes
T033Test state validationWP07P1Yes
T034Test lane status consistencyWP07P1Yes
T035Test commit verificationWP07P1Yes
T036Test review rejection flowWP08P2Yes
T037Test re-implementationWP08P2Yes
T038Test full review cycleWP08P2Yes
T039Test max review cyclesWP08P2Yes
T040Test state transition historyWP08P2Yes
T041Test independent WPs parallelWP09P2Yes
T042Test dependency blockingWP09P2Yes
T043Test circular dependency detectionWP09P2Yes
T044Test diamond patternWP09P2Yes
T045Test linear chainWP09P2Yes
T046Test fan-out patternWP09P2Yes
T047Smoke test infrastructureWP10P3No
T048File touch verificationWP10P3No
T049Parametrized extended agent testsWP10P3Yes
T050Skip behavior for unavailableWP10P3Yes
T051Timing validationWP10P3Yes
T052Create checkpoint_review_rejectedWP11P2Yes
T053Create checkpoint_review_approvedWP11P2Yes
T054Create checkpoint_wp_mergedWP11P2Yes
T055Add stale checkpoint detectionWP11P2No
T056Update root conftest.pyWP12P3Yes
T057Add state validation utilitiesWP12P3Yes
T058Add test output helpersWP12P3Yes
T059Add timeout configurationWP12P3Yes
T060Validate quickstart scenariosWP12P3No

<!-- status-model:start -->

Canonical Status (Generated)

<!-- status-model:end -->

  • WP01: done
  • WP02: done
  • WP03: done
  • WP04: done
  • WP05: done
  • WP06: done
  • WP07: done
  • WP08: done
  • WP09: done
  • WP10: done
  • WP11: done
  • WP12: done