Implementation Plan: Autonomous Multi-Agent Orchestrator

Branch: 020-autonomous-multi-agent-orchestrator | Date: 2026-01-18 | Spec: spec.md Input: Feature specification from /kitty-specs/020-autonomous-multi-agent-orchestrator/spec.md

Summary

Build a Python-based orchestrator that executes spec-kitty features autonomously by: 1. Reading WP dependency graphs from task frontmatter 2. Spawning AI agents in parallel for independent WPs 3. Assigning different agents for implementation vs. review 4. Handling failures with fallback strategies 5. Persisting state for resume after interruption

Research from feature 019 validated that 9 of 12 agents have CLI support suitable for orchestration.

Technical Context

Language/Version: Python 3.11+ (existing spec-kitty codebase) Primary Dependencies: asyncio (stdlib), existing spec-kitty modules (workspace, lane management) Storage: JSON files (.kittify/orchestration-state.json, .kittify/agents.yaml) Testing: pytest with async support Target Platform: macOS, Linux (CLI environments) Project Type: Single project (extends existing spec-kitty CLI) Performance Goals: Support up to 10 concurrent agent processes Constraints: No external database; state must survive process termination Scale/Scope: Features with up to 20 WPs, 9 supported agents

Constitution Check

Constitution file not found - skipping constitution validation.

Project Structure

Documentation (this feature)

kitty-specs/020-autonomous-multi-agent-orchestrator/
├── spec.md              # Feature specification
├── plan.md              # This file
├── research.md          # References feature 019 research
├── data-model.md        # Entity schemas
├── quickstart.md        # Usage guide
└── tasks/               # Work package prompts (generated by /spec-kitty.tasks)

Source Code (repository root)

src/specify_cli/
├── orchestrator/                # NEW: Orchestrator package
│   ├── __init__.py
│   ├── scheduler.py             # Dependency resolution, WP assignment
│   ├── executor.py              # Agent process management
│   ├── monitor.py               # Completion detection, failure handling
│   ├── state.py                 # State persistence and resume
│   ├── config.py                # agents.yaml parsing
│   └── agents/                  # Agent-specific invokers
│       ├── __init__.py
│       ├── base.py              # AgentInvoker protocol
│       ├── claude.py            # Claude Code invoker
│       ├── codex.py             # GitHub Codex invoker
│       ├── copilot.py           # GitHub Copilot invoker
│       ├── gemini.py            # Google Gemini invoker
│       ├── qwen.py              # Qwen Code invoker
│       ├── opencode.py          # OpenCode invoker
│       ├── kilocode.py          # Kilocode invoker
│       ├── augment.py           # Augment Code invoker
│       └── cursor.py            # Cursor invoker (with timeout wrapper)
├── cli/commands/
│   └── orchestrate.py           # NEW: CLI command entry point
└── core/
    └── dependency_graph.py      # EXISTING: Reuse for WP dependencies

tests/specify_cli/orchestrator/
├── test_scheduler.py
├── test_executor.py
├── test_monitor.py
├── test_state.py
├── test_config.py
└── agents/
    └── test_invokers.py

Structure Decision: Extends existing spec-kitty codebase with new orchestrator/ package. Agent invokers are modular to support different CLI patterns per agent.

Architecture

Component Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                           CLI Entry Point                            │
│                         (spec-kitty orchestrate)                     │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                            Orchestrator                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   Scheduler  │───▶│   Executor   │───▶│   Monitor    │          │
│  │              │    │              │    │              │          │
│  │ - Read deps  │    │ - Spawn proc │    │ - Exit codes │          │
│  │ - Assign WPs │    │ - Pipe stdin │    │ - JSON parse │          │
│  │ - Select agt │    │ - Timeouts   │    │ - Retry/fail │          │
│  └──────────────┘    └──────────────┘    └──────────────┘          │
│         │                   │                   │                   │
│         └───────────────────┼───────────────────┘                   │
│                             ▼                                        │
│                   ┌──────────────────┐                              │
│                   │   State Manager  │                              │
│                   │                  │                              │
│                   │ orchestration-   │                              │
│                   │ state.json       │                              │
│                   └──────────────────┘                              │
│                             │                                        │
│         ┌───────────────────┼───────────────────┐                   │
│         ▼                   ▼                   ▼                   │
│  ┌────────────┐     ┌────────────┐     ┌────────────┐              │
│  │  Claude    │     │  Codex     │     │  OpenCode  │   ...        │
│  │  Invoker   │     │  Invoker   │     │  Invoker   │              │
│  └────────────┘     └────────────┘     └────────────┘              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Data Flow

1. Initialization

  • Load agents.yaml config (or use defaults)
  • Detect installed agents via which checks
  • Read feature's tasks/*.md for WP dependencies
  • Build dependency graph

2. Scheduling Loop

  • Find WPs with all dependencies satisfied
  • Assign agents based on role (implementation/review) and priority
  • Respect concurrency limits

3. Execution

  • Create worktree for WP via existing spec-kitty implement infrastructure
  • Spawn agent process with appropriate CLI flags
  • Pipe WP prompt file to stdin
  • Capture stdout/stderr to log file

4. Monitoring

  • Poll for process completion
  • Check exit code (0 = success)
  • Parse JSON output if available
  • Update WP lane status

5. Failure Handling

  • On failure: check retry count
  • If retries remaining: retry with same agent
  • If retries exhausted: apply fallback strategy
  • If all agents fail: pause and alert user

6. State Persistence

  • After each WP completion: write state to JSON
  • On resume: load state, skip completed WPs

Agent Invocation Patterns

Based on feature 019 research:

AgentCommandTask InputHeadless FlagJSON Output
Claude Codeclaudestdin-p--output-format json
GitHub Codexcodex execstdin or arg---json
GitHub Copilotcopilotarg-p--silent
Google Geminigeministdin-p--output-format json
Qwen Codeqwenstdin-p--output-format json
OpenCodeopencode runstdin or -f(default)--format json
Kilocodekilocodearg-a-j
Augment Codeauggiearg--acp(exit code only)
Cursorcursor agentarg-p--output-format json

Cursor Special Handling: Requires timeout 300 wrapper due to potential hanging.

Key Design Decisions

D1: Async vs Threading

Decision: Use asyncio with subprocess for parallel execution.

Rationale:

  • Native Python async is simpler than threading
  • asyncio.create_subprocess_exec handles process spawning cleanly
  • Easier to implement concurrency limits with semaphores
  • Better error handling with structured concurrency

D2: State Persistence Format

Decision: Single JSON file (.kittify/orchestration-state.json)

Rationale:

  • Human-readable for debugging
  • Git-friendly (can be committed for visibility)
  • Atomic writes prevent corruption
  • Simple schema, no migration needed

D3: Agent Selection Strategy

Decision: Priority-based with role filtering

Rationale:

  • User assigns priorities per agent in config
  • Implementation role: use highest-priority implementation-capable agent
  • Review role: use highest-priority review-capable agent that is different from implementation agent
  • Fallback: next in priority order

D4: Concurrency Model

Decision: Semaphore-based with per-agent limits

Rationale:

  • Global semaphore limits total concurrent processes
  • Per-agent semaphores respect individual agent limits
  • Prevents overloading any single agent's rate limits

Integration Points

Existing Spec-Kitty Modules

ModulePurposeHow Used
core/dependency_graph.pyWP dependency resolutionReuse directly
merge/state.pyState persistence patternsFollow same patterns
cli/commands/implement.pyWorktree creationCall programmatically
agent_utils/status.pyLane managementUse for status updates

New CLI Commands

# Start orchestration
spec-kitty orchestrate --feature 020-my-feature

# Resume interrupted orchestration
spec-kitty orchestrate --resume

# Check status
spec-kitty orchestrate --status

# Abort orchestration
spec-kitty orchestrate --abort

Testing Strategy

Unit Tests

  • Scheduler: dependency resolution, agent assignment
  • State: persistence, resume, concurrent writes
  • Config: parsing, validation, defaults

Integration Tests

  • Mock agent processes (fast exit, specific codes)
  • Full orchestration of 3-WP feature
  • Resume after simulated interruption
  • Fallback strategy execution

Manual Testing

  • Real agent execution with test feature
  • Cross-agent review verification
  • Parallel execution timing validation

Risks and Mitigations

RiskMitigation
Agent CLI changesVersion-check agents, warn on mismatch
State corruptionAtomic writes, backup before modify
Runaway processesPer-process timeout, cleanup on abort
Git conflicts in worktreesUse existing worktree isolation model

Phase 0 Research

Status: Complete (leverages feature 019 research)

All technical decisions are informed by the comprehensive research from feature 019:

  • Agent CLI capabilities fully documented
  • Invocation patterns validated
  • Architecture recommendation provided
  • Data model schemas designed

No additional research required.

Phase 1 Artifacts

  • data-model.md - Entity schemas for orchestration
  • quickstart.md - User guide for configuration and usage
  • Agent context updated in CLAUDE.md