Specification: Local Custom Mission Loader
Mission ID: 01KQ2VNJFYFT4371K45VMR8GPD Mission Slug: local-custom-mission-loader-01KQ2VNJ Tracker: Implements #505 (Phase 6 / WP6.5). Parent epic: #468. Umbrella: #461. Distribution ADR (#516) stays open. Status: Draft (Specify phase)
Purpose
TL;DR
Let teams author and run their own local mission definitions as first-class peers to built-in missions through the existing runtime and composition pipeline.
Stakeholder Context
Spec Kitty currently ships only built-in missions (software-dev, research, documentation, plan). Teams whose workflows do not match those built-ins have no first-class extension point — they cannot keep their own per-project mission YAML alongside kitty-specs/ and run it through spec-kitty next. Issue #505 closes that gap with a v1 local loader: project-authored mission definitions discovered from .kittify/missions/ and existing override paths, validated structurally, and dispatched through the same internal runtime + StepContractExecutor + ProfileInvocationExecutor pipeline that built-ins already use. Distribution remains local — SaaS registry, mission install, and cross-team sharing stay deferred under #516. Retrospective execution (alongside retrospective.yaml, synthesizer handoff, HiC/autonomous gating, and summary UI) is out of scope; this tranche only requires a structural retrospective marker so future tranches (#506–#511) can attach behavior to it without breaking compatibility.
Background and Current State
The runtime is already CLI-internal under src/specify_cli/next/_internal_runtime/ (PR #796) and the software-dev mission is rewritten onto profile-invocation composition (PR #795, stabilized by PR #797). The retired spec-kitty-runtime PyPI package is no longer a production dependency, and the preflight fix for #798 (commit cedb77ff in this workspace) removed the last function-scoped imports of it from runtime_bridge.py. Discovery already supports an established precedence chain (explicit / env / project override / project legacy / project config / user global / built-in) along with hooks like SPEC_KITTY_MISSION_PATHS, .kittify/config.yaml mission_packs, and mission-pack.yaml. This mission consumes those existing surfaces; it does not introduce a parallel loader.
User Scenarios and Testing
Primary Actor
A project lead or platform engineer ("operator") who maintains a Spec Kitty-using project and wants their team to follow a custom workflow that is not one of the four built-in missions.
Primary Scenario — Author and Run a Local Custom Mission
1. The operator authors .kittify/missions/erp-integration/mission.yaml describing seven steps: query-erp, lookup-provider, ask-user, create-js, refactor-function, write-report, retrospective. Each agent-executed step declares a profile (or an explicit action / contract binding); the ask-user step is marked as a human/input gate; the final step has the recognized retrospective marker. 2. The operator runs spec-kitty mission run erp-integration --mission erp-q3-rollout from the project root. The CLI resolves the erp-integration mission key through the existing discovery precedence, validates the YAML, scaffolds (or attaches to) the tracked mission erp-q3-rollout under kitty-specs/, and starts the runtime against the discovered template. 3. As the runtime advances, agent-executed steps dispatch through StepContractExecutor / ProfileInvocationExecutor and produce paired started / completed|failed invocation records that record the contract action. 4. The ask-user step pauses the runtime via the existing decision_required path; the operator answers, and the runtime resumes. 5. After the last step before the retrospective marker, the runtime treats the marker as a structural step (no execution side effect this tranche).
Exception Path — Missing Retrospective Marker
The operator publishes .kittify/missions/legacy-flow/mission.yaml without a recognizable retrospective marker. The first command that loads the definition (spec-kitty mission run legacy-flow --mission whatever) exits non-zero with a clear, structured error naming the mission key, the missing marker, and the file path. No tracked mission is started.
Exception Path — Ambiguous Mission Definition
Two layers of the discovery precedence (e.g., a project-override and a built-in) both declare the same mission key. The loader rejects the run with a structured MISSION_KEY_AMBIGUOUS error listing every source path, so the operator can decide which copy is canonical.
Edge Cases
- Malformed YAML, unknown top-level keys, missing required runtime fields (
mission.key,mission.name,steps[]). - A composed step missing both an
agent_profilefield and an explicit action / contract binding. - A mission key resolved through
SPEC_KITTY_MISSION_PATHSwhose definition shadows a built-in name (must be flagged, not silently overridden). - A mission-pack manifest that exposes a custom mission via the existing pack discovery hook.
- Backward compatibility: starting a built-in mission (
software-dev,research,documentation,plan) must behave identically to current behavior with no new validation rejections.
Domain Language
| Term | Canonical meaning in this spec |
|---|---|
| Mission key | The reusable identifier of a custom mission definition (e.g., erp-integration). Resolved through discovery precedence. |
| Mission slug | The identifier of a tracked mission run under kitty-specs/<slug>/ (e.g., erp-q3-rollout-01KQ…). |
| Custom mission | A non-built-in mission definition discovered through the local loader. |
| Built-in mission | One of the four currently bundled missions: software-dev, research, documentation, plan. |
| Composed step | An agent-executed step routed through StepContractExecutor and ProfileInvocationExecutor. |
| Decision-required step | A human/input gate routed through the internal runtime's existing decision_required path. |
| Retrospective marker | A structural marker (e.g., id: retrospective) on a final step that this tranche only validates; no execution semantics yet. |
Avoid the synonym "mission name" for either of mission key or mission slug — the terms are not interchangeable.
Functional Requirements
| ID | Requirement | Status |
|---|---|---|
| FR-001 | Provide an operator-facing CLI surface to start and run a custom mission definition. The v1 contract is spec-kitty mission run <mission-key> --mission <mission-slug> [--json], where <mission-key> selects the reusable definition and <mission-slug> identifies the tracked mission under kitty-specs/. | Locked |
| FR-002 | Resolve <mission-key> through the existing internal runtime discovery precedence: explicit / environment / project override / project legacy / project config / user global / built-in. No new precedence introduced. | Locked |
| FR-003 | Load custom mission definitions from .kittify/missions/<key>/, .kittify/overrides/missions/<key>/, and any existing mission-pack discovery hooks (e.g., mission-pack.yaml, .kittify/config.yaml mission_packs, SPEC_KITTY_MISSION_PATHS) without adding a parallel loader. | Locked |
| FR-004 | Reject invalid custom mission definitions at load time with structured, actionable errors covering: malformed YAML, missing required runtime fields (mission.key, mission.name, mission.version, steps[]), unresolved mission key, ambiguous / shadowed definitions, and missing retrospective marker. Each error includes the file path(s), the mission key (when known), and a stable error code suitable for tooling. | Locked |
| FR-005 | A custom mission definition that does not declare a structural retrospective step or marker MUST be rejected before the runtime starts. The retrospective step is not executed in this tranche. | Locked |
| FR-006 | Agent-executed custom steps MUST dispatch through StepContractExecutor and ProfileInvocationExecutor, preserving invocation trail records (paired started + completed/failed), action_hint, mode-of-work, DRG context resolution, and glossary chokepoint behavior. | Locked |
| FR-007 | Human/input decision steps MUST use the existing internal runtime decision_required path (snapshot pending_decisions, DecisionInputRequested event), not a synthetic profile invocation. | Locked |
| FR-008 | Each composed custom step MUST resolve a profile through an explicit per-step agent_profile (alias accepted: agent-profile) field on the runtime step, or through an explicit action / contract binding. The software-dev-specific _ACTION_PROFILE_DEFAULTS table MUST NOT be expanded as a generic fallback for arbitrary custom missions. | Locked |
| FR-009 | A reference custom mission representing the operator's "ERP" example (query-erp → lookup-provider → ask-user → create-js → refactor-function → write-report → retrospective) MUST be authorable as local YAML and exercisable end-to-end through the runtime in tests, including a decision_required step and composed profile-invocation steps. | Locked |
| FR-010 | Existing built-in missions (software-dev, research, documentation, plan) MUST keep their current end-to-end behavior. The new loader must not alter validation, dispatch, or invocation semantics for built-ins. | Locked |
| FR-011 | When a custom mission key shadows a built-in mission key, the loader MUST surface a structured warning or error (per FR-004 ambiguity rules) rather than silently overriding the built-in. The exact severity (warn vs. reject) is a planning-phase decision but the behavior MUST be deterministic and documented. | Open (planning resolves warn-vs-reject) |
| FR-012 | The loader MUST be reachable from the existing spec-kitty next advancement path for a tracked mission whose definition resolves to a custom mission, so the runtime advances composed and decision-required steps identically to the mission run entry point. | Locked |
| FR-013 | Validation errors MUST be representable in --json output (where applicable) so external tooling can consume them, while still printing a human-readable summary on the default text channel. | Locked |
Non-Functional Requirements
| ID | Requirement | Threshold | Status |
|---|---|---|---|
| NFR-001 | Loading and validating a single custom mission definition (≤ 50 steps) plus discovering all built-ins MUST complete fast enough to be invisible to operators on local hardware. | < 250 ms p95 on a Mac/Linux dev machine; benchmarked in tests against the ERP fixture. | Locked |
| NFR-002 | Every error path defined under FR-004 MUST emit an error code from a closed enumeration (e.g., MISSION_KEY_AMBIGUOUS, MISSION_RETROSPECTIVE_MISSING, MISSION_YAML_MALFORMED) and a stable JSON shape. | 100% of FR-004 cases covered by named error codes; new codes documented in docs/reference/missions.md. | Locked |
| NFR-003 | Test coverage of the new loader, validation, and runtime dispatch surface MUST meet the project's "90%+ test coverage for new code" charter standard. | ≥ 90% line coverage on new modules under src/specify_cli/next/_internal_runtime/discovery.py additions, the loader, and validators; reported via pytest --cov in CI. | Locked |
| NFR-004 | The reference ERP fixture suite MUST run in under 10 seconds locally so it is a practical inner-loop test. | < 10 s wall clock on the same hardware as NFR-001. | Locked |
| NFR-005 | All new code MUST type-check clean under mypy --strict per the charter. | Zero mypy --strict errors on new / changed modules. | Locked |
Constraints
| ID | Constraint | Status |
|---|---|---|
| C-001 | No SaaS mission registry, spec-kitty mission install, or cross-team distribution work in this tranche. Distribution stays under #516. | Locked |
| C-002 | No production import or runtime dependency on the retired spec-kitty-runtime PyPI package. The architectural boundary test in tests/architectural/test_shared_package_boundary.py MUST stay green. | Locked |
| C-003 | No legacy DAG fall-through reintroduction for composition-backed actions. PR #797's invariants stay in force. | Locked |
| C-004 | Invocation JSONL MUST NOT be written outside ProfileInvocationExecutor / InvocationWriter. The loader / runtime path is a consumer, not a writer. | Locked |
| C-005 | Retrospective execution, retrospective.yaml writing, synthesizer handoff, HiC / autonomous gating, and summary UI are out of scope. Marker validation only. | Locked |
| C-006 | Architecture boundary preserved: host LLM / harness owns reading and generation; Spec Kitty owns routing, governance context assembly, validation, trail writing, provenance, DRG checks, staging / promotion, and additive propagation. | Locked |
| C-007 | No new top-level CLI command groups beyond what the v1 contract requires. spec-kitty mission run may be a new subcommand on the existing mission group; no other new groups. | Locked |
| C-008 | Tests use real filesystem fixtures under tmp_path; no monkey-patching of the loader past well-defined seams. (Charter: "Integration tests for CLI commands.") | Locked |
Success Criteria
1. Operators can run a project-authored custom mission end-to-end (spec-kitty mission run → spec-kitty next) with the same observable behavior as a built-in mission. Verifiable via the ERP fixture's full runtime walk in tests. 2. Operators authoring a custom mission without a retrospective marker see a single clear, structured error within one second of running the load command, naming the missing marker and the file path. Verifiable via a focused unit test against the validator. 3. Two layers exposing the same mission key produce a deterministic, named ambiguity error (or warning, per FR-011) listing every source path. No silent override. Verifiable via discovery-precedence tests. 4. All existing built-in missions continue passing their current test suites unchanged. Verifiable via tests/specify_cli/next/test_runtime_bridge_composition.py and the parity / coverage suites. 5. The spec_kitty_runtime import boundary remains clean: zero production imports under src/. Verifiable via tests/architectural/test_shared_package_boundary.py. 6. Validation errors are consumable by external tooling via --json output with stable error codes. Verifiable via JSON-schema-pinned tests.
Key Entities
- Custom mission definition file — a YAML document under
.kittify/missions/<key>/mission.yaml(or override path / pack manifest entry) whose top-level shape mirrors the existing internal runtime template (mission.key,mission.name,mission.version,steps[], optionalaudit_steps[]). - Runtime step record (per-step within
steps[]) — at minimumid,kind(composed / decision_required), and eitheragent_profile(aliasagent-profile) or an explicit action / contract binding. The retrospective step is recognized structurally (e.g.,id: retrospective). - Discovery context — the existing internal runtime
DiscoveryContext(chain of explicit / env / project override / legacy / config / user global / built-in sources), unchanged in shape. - Mission step contract — existing
MissionStepContractrecords (mission: <mission-key>,action: <step-id>or explicit binding) loaded throughsrc/doctrine/mission_step_contracts/repository.py. Each composed custom step expects a matching contract. - Tracked mission record — the
kitty-specs/<mission-slug>/directory created (or attached to) whenspec-kitty mission runis invoked. Identity (mission_id ULID, mid8, slug, branch contract) follows the existing identity model documented in CLAUDE.md.
Assumptions
- Discovery precedence (explicit / env / project override / project legacy / project config / user global / built-in) is already implemented in
_internal_runtime/discovery.py; the loader extends it without reordering. - Mission-pack discovery (
mission-pack.yaml,.kittify/config.yaml mission_packs) is already wired through the same module. If any of these are not actually wired, the planning phase will scope a minimal addition rather than design a parallel mechanism. - The runtime template shape (
mission.key,mission.name,mission.version,steps[], optionalaudit_steps[]) is the right authoring surface for v1. Any divergence is a planning-phase decision. - "Structural retrospective marker" means
id: retrospectiveon the last step (or aretrospective: trueflag on a step). Exact spelling is locked in plan; the validator only requires recognizability.
Out of Scope (defer-only)
- Any execution semantics for the retrospective step (deferred to #506 / #507 / #508 / #509 / #510 / #511).
- SaaS mission registry,
spec-kitty mission install, cross-team distribution (deferred to #516). - Cross-mission summary UI or HiC/autonomous lifecycle gating.
- Phase 7 work (#469).
- Adding new mission types beyond the loader's discovery; no new built-ins ship in this tranche.
- Changes to the retired
spec-kitty-runtimePyPI package; that package stays retired.
Dependencies
- Internal:
_internal_runtime/discovery.py,_internal_runtime/engine.py,_internal_runtime/planner.py,_internal_runtime/schema.py,mission_step_contracts/executor.py(StepContractExecutor),ProfileInvocationExecutor,runtime_bridge.py,cli/commands/mission.py,cli/commands/mission_type.py,doctrine/mission_step_contracts/repository.py. - External:
spec_kitty_events(PyPI; payload models),spec_kitty_tracker(PyPI; per existing usage). No new external deps anticipated; planning will confirm. - Preflight: #798 fix (
cedb77fflocally onmain) must remain in place. Architectural boundary test must stay green.