Implementation Plan: Autonomous Multi-Agent Orchestrator
Branch: 020-autonomous-multi-agent-orchestrator | Date: 2026-01-18 | Spec: spec.md Input: Feature specification from /kitty-specs/020-autonomous-multi-agent-orchestrator/spec.md
Summary
Build a Python-based orchestrator that executes spec-kitty features autonomously by: 1. Reading WP dependency graphs from task frontmatter 2. Spawning AI agents in parallel for independent WPs 3. Assigning different agents for implementation vs. review 4. Handling failures with fallback strategies 5. Persisting state for resume after interruption
Research from feature 019 validated that 9 of 12 agents have CLI support suitable for orchestration.
Technical Context
Language/Version: Python 3.11+ (existing spec-kitty codebase) Primary Dependencies: asyncio (stdlib), existing spec-kitty modules (workspace, lane management) Storage: JSON files (.kittify/orchestration-state.json, .kittify/agents.yaml) Testing: pytest with async support Target Platform: macOS, Linux (CLI environments) Project Type: Single project (extends existing spec-kitty CLI) Performance Goals: Support up to 10 concurrent agent processes Constraints: No external database; state must survive process termination Scale/Scope: Features with up to 20 WPs, 9 supported agents
Constitution Check
Constitution file not found - skipping constitution validation.
Project Structure
Documentation (this feature)
kitty-specs/020-autonomous-multi-agent-orchestrator/
├── spec.md # Feature specification
├── plan.md # This file
├── research.md # References feature 019 research
├── data-model.md # Entity schemas
├── quickstart.md # Usage guide
└── tasks/ # Work package prompts (generated by /spec-kitty.tasks)
Source Code (repository root)
src/specify_cli/
├── orchestrator/ # NEW: Orchestrator package
│ ├── __init__.py
│ ├── scheduler.py # Dependency resolution, WP assignment
│ ├── executor.py # Agent process management
│ ├── monitor.py # Completion detection, failure handling
│ ├── state.py # State persistence and resume
│ ├── config.py # agents.yaml parsing
│ └── agents/ # Agent-specific invokers
│ ├── __init__.py
│ ├── base.py # AgentInvoker protocol
│ ├── claude.py # Claude Code invoker
│ ├── codex.py # GitHub Codex invoker
│ ├── copilot.py # GitHub Copilot invoker
│ ├── gemini.py # Google Gemini invoker
│ ├── qwen.py # Qwen Code invoker
│ ├── opencode.py # OpenCode invoker
│ ├── kilocode.py # Kilocode invoker
│ ├── augment.py # Augment Code invoker
│ └── cursor.py # Cursor invoker (with timeout wrapper)
├── cli/commands/
│ └── orchestrate.py # NEW: CLI command entry point
└── core/
└── dependency_graph.py # EXISTING: Reuse for WP dependencies
tests/specify_cli/orchestrator/
├── test_scheduler.py
├── test_executor.py
├── test_monitor.py
├── test_state.py
├── test_config.py
└── agents/
└── test_invokers.py
Structure Decision: Extends existing spec-kitty codebase with new orchestrator/ package. Agent invokers are modular to support different CLI patterns per agent.
Architecture
Component Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ CLI Entry Point │
│ (spec-kitty orchestrate) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Orchestrator │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Scheduler │───▶│ Executor │───▶│ Monitor │ │
│ │ │ │ │ │ │ │
│ │ - Read deps │ │ - Spawn proc │ │ - Exit codes │ │
│ │ - Assign WPs │ │ - Pipe stdin │ │ - JSON parse │ │
│ │ - Select agt │ │ - Timeouts │ │ - Retry/fail │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ └───────────────────┼───────────────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ State Manager │ │
│ │ │ │
│ │ orchestration- │ │
│ │ state.json │ │
│ └──────────────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Claude │ │ Codex │ │ OpenCode │ ... │
│ │ Invoker │ │ Invoker │ │ Invoker │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Data Flow
1. Initialization
- Load
agents.yamlconfig (or use defaults) - Detect installed agents via
whichchecks - Read feature's
tasks/*.mdfor WP dependencies - Build dependency graph
2. Scheduling Loop
- Find WPs with all dependencies satisfied
- Assign agents based on role (implementation/review) and priority
- Respect concurrency limits
3. Execution
- Create worktree for WP via existing
spec-kitty implementinfrastructure - Spawn agent process with appropriate CLI flags
- Pipe WP prompt file to stdin
- Capture stdout/stderr to log file
4. Monitoring
- Poll for process completion
- Check exit code (0 = success)
- Parse JSON output if available
- Update WP lane status
5. Failure Handling
- On failure: check retry count
- If retries remaining: retry with same agent
- If retries exhausted: apply fallback strategy
- If all agents fail: pause and alert user
6. State Persistence
- After each WP completion: write state to JSON
- On resume: load state, skip completed WPs
Agent Invocation Patterns
Based on feature 019 research:
| Agent | Command | Task Input | Headless Flag | JSON Output |
|---|---|---|---|---|
| Claude Code | claude | stdin | -p | --output-format json |
| GitHub Codex | codex exec | stdin or arg | - | --json |
| GitHub Copilot | copilot | arg | -p | --silent |
| Google Gemini | gemini | stdin | -p | --output-format json |
| Qwen Code | qwen | stdin | -p | --output-format json |
| OpenCode | opencode run | stdin or -f | (default) | --format json |
| Kilocode | kilocode | arg | -a | -j |
| Augment Code | auggie | arg | --acp | (exit code only) |
| Cursor | cursor agent | arg | -p | --output-format json |
Cursor Special Handling: Requires timeout 300 wrapper due to potential hanging.
Key Design Decisions
D1: Async vs Threading
Decision: Use asyncio with subprocess for parallel execution.
Rationale:
- Native Python async is simpler than threading
asyncio.create_subprocess_exechandles process spawning cleanly- Easier to implement concurrency limits with semaphores
- Better error handling with structured concurrency
D2: State Persistence Format
Decision: Single JSON file (.kittify/orchestration-state.json)
Rationale:
- Human-readable for debugging
- Git-friendly (can be committed for visibility)
- Atomic writes prevent corruption
- Simple schema, no migration needed
D3: Agent Selection Strategy
Decision: Priority-based with role filtering
Rationale:
- User assigns priorities per agent in config
- Implementation role: use highest-priority implementation-capable agent
- Review role: use highest-priority review-capable agent that is different from implementation agent
- Fallback: next in priority order
D4: Concurrency Model
Decision: Semaphore-based with per-agent limits
Rationale:
- Global semaphore limits total concurrent processes
- Per-agent semaphores respect individual agent limits
- Prevents overloading any single agent's rate limits
Integration Points
Existing Spec-Kitty Modules
| Module | Purpose | How Used |
|---|---|---|
core/dependency_graph.py | WP dependency resolution | Reuse directly |
merge/state.py | State persistence patterns | Follow same patterns |
cli/commands/implement.py | Worktree creation | Call programmatically |
agent_utils/status.py | Lane management | Use for status updates |
New CLI Commands
# Start orchestration
spec-kitty orchestrate --feature 020-my-feature
# Resume interrupted orchestration
spec-kitty orchestrate --resume
# Check status
spec-kitty orchestrate --status
# Abort orchestration
spec-kitty orchestrate --abort
Testing Strategy
Unit Tests
- Scheduler: dependency resolution, agent assignment
- State: persistence, resume, concurrent writes
- Config: parsing, validation, defaults
Integration Tests
- Mock agent processes (fast exit, specific codes)
- Full orchestration of 3-WP feature
- Resume after simulated interruption
- Fallback strategy execution
Manual Testing
- Real agent execution with test feature
- Cross-agent review verification
- Parallel execution timing validation
Risks and Mitigations
| Risk | Mitigation |
|---|---|
| Agent CLI changes | Version-check agents, warn on mismatch |
| State corruption | Atomic writes, backup before modify |
| Runaway processes | Per-process timeout, cleanup on abort |
| Git conflicts in worktrees | Use existing worktree isolation model |
Phase 0 Research
Status: Complete (leverages feature 019 research)
All technical decisions are informed by the comprehensive research from feature 019:
- Agent CLI capabilities fully documented
- Invocation patterns validated
- Architecture recommendation provided
- Data model schemas designed
No additional research required.
Phase 1 Artifacts
data-model.md- Entity schemas for orchestrationquickstart.md- User guide for configuration and usage- Agent context updated in CLAUDE.md