Spec Kitty

└─ kitty-specs
   └─ Autonomous Multi-Agent Orchestrator

Mission Run:

📚 Docs ↗

Implementation Plan: Autonomous Multi-Agent Orchestrator

Branch: 020-autonomous-multi-agent-orchestrator | Date: 2026-01-18 | Spec: spec.md Input: Feature specification from /kitty-specs/020-autonomous-multi-agent-orchestrator/spec.md

Summary

Build a Python-based orchestrator that executes spec-kitty features autonomously by: 1. Reading WP dependency graphs from task frontmatter 2. Spawning AI agents in parallel for independent WPs 3. Assigning different agents for implementation vs. review 4. Handling failures with fallback strategies 5. Persisting state for resume after interruption

Research from feature 019 validated that 9 of 12 agents have CLI support suitable for orchestration.

Technical Context

Language/Version: Python 3.11+ (existing spec-kitty codebase) Primary Dependencies: asyncio (stdlib), existing spec-kitty modules (workspace, lane management) Storage: JSON files (.kittify/orchestration-state.json, .kittify/agents.yaml) Testing: pytest with async support Target Platform: macOS, Linux (CLI environments) Project Type: Single project (extends existing spec-kitty CLI) Performance Goals: Support up to 10 concurrent agent processes Constraints: No external database; state must survive process termination Scale/Scope: Features with up to 20 WPs, 9 supported agents

Constitution Check

Constitution file not found - skipping constitution validation.

Project Structure

Documentation (this feature)

kitty-specs/020-autonomous-multi-agent-orchestrator/
├── spec.md              # Feature specification
├── plan.md              # This file
├── research.md          # References feature 019 research
├── data-model.md        # Entity schemas
├── quickstart.md        # Usage guide
└── tasks/               # Work package prompts (generated by /spec-kitty.tasks)

Source Code (repository root)

src/specify_cli/
├── orchestrator/                # NEW: Orchestrator package
│   ├── __init__.py
│   ├── scheduler.py             # Dependency resolution, WP assignment
│   ├── executor.py              # Agent process management
│   ├── monitor.py               # Completion detection, failure handling
│   ├── state.py                 # State persistence and resume
│   ├── config.py                # agents.yaml parsing
│   └── agents/                  # Agent-specific invokers
│       ├── __init__.py
│       ├── base.py              # AgentInvoker protocol
│       ├── claude.py            # Claude Code invoker
│       ├── codex.py             # GitHub Codex invoker
│       ├── copilot.py           # GitHub Copilot invoker
│       ├── gemini.py            # Google Gemini invoker
│       ├── qwen.py              # Qwen Code invoker
│       ├── opencode.py          # OpenCode invoker
│       ├── kilocode.py          # Kilocode invoker
│       ├── augment.py           # Augment Code invoker
│       └── cursor.py            # Cursor invoker (with timeout wrapper)
├── cli/commands/
│   └── orchestrate.py           # NEW: CLI command entry point
└── core/
    └── dependency_graph.py      # EXISTING: Reuse for WP dependencies

tests/specify_cli/orchestrator/
├── test_scheduler.py
├── test_executor.py
├── test_monitor.py
├── test_state.py
├── test_config.py
└── agents/
    └── test_invokers.py

Structure Decision: Extends existing spec-kitty codebase with new orchestrator/ package. Agent invokers are modular to support different CLI patterns per agent.

Architecture

Component Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                           CLI Entry Point                            │
│                         (spec-kitty orchestrate)                     │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                            Orchestrator                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   Scheduler  │───▶│   Executor   │───▶│   Monitor    │          │
│  │              │    │              │    │              │          │
│  │ - Read deps  │    │ - Spawn proc │    │ - Exit codes │          │
│  │ - Assign WPs │    │ - Pipe stdin │    │ - JSON parse │          │
│  │ - Select agt │    │ - Timeouts   │    │ - Retry/fail │          │
│  └──────────────┘    └──────────────┘    └──────────────┘          │
│         │                   │                   │                   │
│         └───────────────────┼───────────────────┘                   │
│                             ▼                                        │
│                   ┌──────────────────┐                              │
│                   │   State Manager  │                              │
│                   │                  │                              │
│                   │ orchestration-   │                              │
│                   │ state.json       │                              │
│                   └──────────────────┘                              │
│                             │                                        │
│         ┌───────────────────┼───────────────────┐                   │
│         ▼                   ▼                   ▼                   │
│  ┌────────────┐     ┌────────────┐     ┌────────────┐              │
│  │  Claude    │     │  Codex     │     │  OpenCode  │   ...        │
│  │  Invoker   │     │  Invoker   │     │  Invoker   │              │
│  └────────────┘     └────────────┘     └────────────┘              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Data Flow

1. Initialization

Load agents.yaml config (or use defaults)
Detect installed agents via which checks
Read feature's tasks/*.md for WP dependencies
Build dependency graph

2. Scheduling Loop

Find WPs with all dependencies satisfied
Assign agents based on role (implementation/review) and priority
Respect concurrency limits

3. Execution

Create worktree for WP via existing spec-kitty implement infrastructure
Spawn agent process with appropriate CLI flags
Pipe WP prompt file to stdin
Capture stdout/stderr to log file

4. Monitoring

Poll for process completion
Check exit code (0 = success)
Parse JSON output if available
Update WP lane status

5. Failure Handling

On failure: check retry count
If retries remaining: retry with same agent
If retries exhausted: apply fallback strategy
If all agents fail: pause and alert user

6. State Persistence

After each WP completion: write state to JSON
On resume: load state, skip completed WPs

Agent Invocation Patterns

Based on feature 019 research:

Agent	Command	Task Input	Headless Flag	JSON Output
Claude Code	`claude`	stdin	`-p`	`--output-format json`
GitHub Codex	`codex exec`	stdin or arg	`-`	`--json`
GitHub Copilot	`copilot`	arg	`-p`	`--silent`
Google Gemini	`gemini`	stdin	`-p`	`--output-format json`
Qwen Code	`qwen`	stdin	`-p`	`--output-format json`
OpenCode	`opencode run`	stdin or `-f`	(default)	`--format json`
Kilocode	`kilocode`	arg	`-a`	`-j`
Augment Code	`auggie`	arg	`--acp`	(exit code only)
Cursor	`cursor agent`	arg	`-p`	`--output-format json`

Cursor Special Handling: Requires timeout 300 wrapper due to potential hanging.

Key Design Decisions

D1: Async vs Threading

Decision: Use asyncio with subprocess for parallel execution.

Rationale:

Native Python async is simpler than threading
asyncio.create_subprocess_exec handles process spawning cleanly
Easier to implement concurrency limits with semaphores
Better error handling with structured concurrency

D2: State Persistence Format

Decision: Single JSON file (.kittify/orchestration-state.json)

Rationale:

Human-readable for debugging
Git-friendly (can be committed for visibility)
Atomic writes prevent corruption
Simple schema, no migration needed

D3: Agent Selection Strategy

Decision: Priority-based with role filtering

Rationale:

User assigns priorities per agent in config
Implementation role: use highest-priority implementation-capable agent
Review role: use highest-priority review-capable agent that is different from implementation agent
Fallback: next in priority order

D4: Concurrency Model

Decision: Semaphore-based with per-agent limits

Rationale:

Global semaphore limits total concurrent processes
Per-agent semaphores respect individual agent limits
Prevents overloading any single agent's rate limits

Integration Points

Existing Spec-Kitty Modules

Module	Purpose	How Used
`core/dependency_graph.py`	WP dependency resolution	Reuse directly
`merge/state.py`	State persistence patterns	Follow same patterns
`cli/commands/implement.py`	Worktree creation	Call programmatically
`agent_utils/status.py`	Lane management	Use for status updates

New CLI Commands

# Start orchestration
spec-kitty orchestrate --feature 020-my-feature

# Resume interrupted orchestration
spec-kitty orchestrate --resume

# Check status
spec-kitty orchestrate --status

# Abort orchestration
spec-kitty orchestrate --abort

Testing Strategy

Unit Tests

Scheduler: dependency resolution, agent assignment
State: persistence, resume, concurrent writes
Config: parsing, validation, defaults

Integration Tests

Mock agent processes (fast exit, specific codes)
Full orchestration of 3-WP feature
Resume after simulated interruption
Fallback strategy execution

Manual Testing

Real agent execution with test feature
Cross-agent review verification
Parallel execution timing validation

Risks and Mitigations

Risk	Mitigation
Agent CLI changes	Version-check agents, warn on mismatch
State corruption	Atomic writes, backup before modify
Runaway processes	Per-process timeout, cleanup on abort
Git conflicts in worktrees	Use existing worktree isolation model

Phase 0 Research

Status: Complete (leverages feature 019 research)

All technical decisions are informed by the comprehensive research from feature 019:

Agent CLI capabilities fully documented
Invocation patterns validated
Architecture recommendation provided
Data model schemas designed

No additional research required.

Phase 1 Artifacts

data-model.md - Entity schemas for orchestration
quickstart.md - User guide for configuration and usage
Agent context updated in CLAUDE.md