Specification — Research Mission Composition Rewrite v2
Mission ID: 01KQ4QVVZ4DC6CXA1XCZZAQ8AG Mission slug: research-mission-composition-rewrite-v2-01KQ4QVV Mission type: software-dev Target branch: main Created: 2026-04-26 Source: Reroll of Phase 6 GitHub issue #504 after the v1 attempt failed external review Baseline commit: e056f39870343c31300959099d6955f1c8ed48e3 on origin/main Prior attempt evidence: git tag attempt/research-composition-mission-100-broken at commit d10af600 (local-only, never pushed)
Purpose
TL;DR
Deliver a runnable, composition-backed research mission. Operators must be able to create a fresh research mission and advance it via spec-kitty next. Each composed action must resolve real governance context from the validated DRG, and missing-artifact scenarios must produce structured guard failures, not silent passes.
Context
PRs #795 / #797 / #799 already moved software-dev onto the Phase 6 composition substrate and added the local custom mission loader. Issue #504 promised the same migration for the built-in research mission. The v1 attempt at #504 (preserved at tag attempt/research-composition-mission-100-broken) authored 5 step contracts, 5 doctrine bundles, 5 profile-default entries, 1 dispatch-table entry, and 3 test files — but it left the runtime gap unbridged. External review found four blocking defects:
1. Research has no MissionTemplate.steps and no mission.key, so get_or_start_run('demo-research', repo, 'research') raises MissionRuntimeError: Mission 'research' not found. The mission cannot start. 2. The new action doctrine bundles exist on disk but the validated DRG has no action:research/* nodes; resolve_context() returns empty artifact_urns. Right-sized governance context is not delivered. 3. _check_composed_action_guard() handles only software-dev action names; research actions fall through with no failures, so missing-artifact runs silently succeed on the composed path. 4. The v1 integration walk called _dispatch_via_composition() directly because it could not enter the real runtime. It cannot prove end-to-end runnability or guard parity.
This mission closes those four defects. The contracts, doctrine bundles, profile defaults, and dispatch entry are re-authored on top of the corrected substrate, but the value-add is the four corrections.
User Scenarios & Testing
Primary actor
A spec-kitty operator (human or agent harness) who runs spec-kitty agent mission create demo-research --mission-type research and then drives the mission via spec-kitty next in a clean checkout.
Acceptance Scenarios
Scenario 1 — Fresh research mission starts and advances via composition
- Given a clean spec-kitty checkout with no prior research missions
- When the operator runs
spec-kitty agent mission create demo-research --mission-type research --json, parses the result, then runsspec-kitty next --agent <name> --mission <handle> - Then the runtime returns a next-step decision without raising
MissionRuntimeError - And the step is dispatched via
StepContractExecutor(not the legacy DAG) - And the operator's invocation trail under
~/.kittify/invocations/contains a pairedstarted+done(orfailed) lifecycle for the action
Scenario 2 — Each research action resolves real DRG context
- Given the merged code and a freshly created research mission
- When the runtime resolves governance context for any of the five research actions
- Then
load_validated_graph(repo).get_node('action:research/<action>')is truthy for each ofscoping,methodology,gathering,synthesis,output - And
resolve_context(graph, 'action:research/<action>', depth=...)returns non-emptyartifact_urns - And the action-scoped doctrine surfaced to the host LLM is the bundle authored under
src/doctrine/missions/research/actions/<action>/, not the empty default
Scenario 3 — Missing artifacts produce structured guard failures
- Given an empty research feature directory (no
spec.md, noplan.md, nosource-register.csv, nofindings.md) - When the runtime attempts to advance via composition for any of the five research actions
- Then
_check_composed_action_guard()returns a non-empty failure list naming the missing artifact (or the failing predicate frommission.yaml) - And
_dispatch_via_composition()propagates the failure as a structured error with no run-state advancement - And the legacy DAG path is NOT invoked as a fallback (PR #797 invariant preserved)
Scenario 4 — Real-runtime integration walk passes
- Given the test suite at HEAD on
main - When an operator runs
uv run pytest tests/integration/test_research_runtime_walk.py -v - Then at least one test in that file calls
get_or_start_run(or its programmatic equivalent) and drives a research mission through every advancing action via the live runtime - And that test does NOT mock
_dispatch_via_composition,StepContractExecutor.execute, or any frozen-template loader - And the test asserts paired lifecycle records, action_hint correctness, and structured guard failure on missing artifacts
Scenario 5 — Software-dev and custom-mission paths preserved
- Given the existing software-dev composition test suite, the custom-mission walk test, and the runtime bridge composition test on
origin/main - When they run after this mission lands
- Then they pass byte-identically — no test edits beyond import-path adjustments forced by new module placement, if any
Scenario 6 — Operator dogfood smoke matches the integration walk
- Given the merged code on a clean repo
- When an operator follows the quickstart's "drive a real research mission" section
- Then the same outcomes Scenario 1 asserts are observable interactively, and the trail records under
~/.kittify/invocations/show the research-native action names (not profile-default verbs)
Edge cases
- A research action whose contract resolves successfully but whose composed step raises an exception inside
StepContractExecutor.execute: the invocation lifecycle must close asfailed, run state must not advance, and the legacy DAG must not be invoked as fallback. - A doctrine bundle that exists on disk but is not referenced by the DRG:
resolve_context()must return empty for that action, and a contract referencing it must surface a structured error pointing at the missing graph node, not silently succeed. - Two consecutive composed research actions sharing the same profile but different action_hints: each invocation must record its own action_hint and action-scoped doctrine context.
- A future runtime change that adds a sixth research action without a corresponding entry in
_check_composed_action_guard: the guard must fail closed (return a structured "no guard registered for (research, X)" error), not silently pass.
Domain Language (canonical terms)
| Term | Meaning | Avoid as synonym |
|---|---|---|
| MissionTemplate | The Pydantic schema at src/specify_cli/next/_internal_runtime/schema.py:445 that the runtime engine consumes; requires mission.key, steps: list[PromptStep], optional audit_steps. | "mission spec", "mission file" |
| Composition substrate | StepContractExecutor + ProfileInvocationExecutor + the _should_dispatch_via_composition fast path. | "v2 path", "new runtime" |
| Validated DRG | The graph returned by charter._drg_helpers.load_validated_graph(repo). The shipped portion lives at <doctrine_root>/graph.yaml; project overlays at .kittify/doctrine/graph.yaml. | "DRG", "doctrine graph" |
| Action node | A DRG node with URN of the form action:<mission>/<action>. Carries VOCABULARY/SCOPE/etc. edges that resolve_context() walks to populate artifact_urns. | "doctrine entry", "action index entry" |
| Composed-action guard | The function _check_composed_action_guard() in runtime_bridge.py that fires after composition to verify expected artifacts/events; returns a non-empty failure list to block run-state advancement. | "post-action validator", "guard" |
| Real-runtime walk | An integration test that calls get_or_start_run (or decide_next_via_runtime end-to-end) without mocking _dispatch_via_composition, StepContractExecutor.execute, frozen-template loaders, or the DRG. | "integration test" |
| Dogfood smoke | A documented operator-facing sequence (quickstart) that creates a real research mission and advances it. The mission-review skill must execute this before issuing PASS. | "smoke test", "dry run" |
Requirements
Functional Requirements
| ID | Requirement | Status | Notes |
|---|---|---|---|
| FR-001 | A fresh research mission MUST start via get_or_start_run(slug, repo, 'research') from a clean repo without raising MissionRuntimeError. | Required | Closes v1 P0 finding. |
| FR-002 | The runtime MUST advance at least one composed step in a fresh research mission via spec-kitty next without falling through to the legacy DAG. | Required | Closes v1 P0 finding. |
| FR-003 | The research MissionTemplate MUST declare mission.key: research, an explicit non-empty steps: list[PromptStep], and any audit_steps required by the schema. | Required | Inferred from schema.py:445. |
| FR-004 | For every research action in _COMPOSED_ACTIONS_BY_MISSION["research"], load_validated_graph(repo).get_node(f'action:research/{action}') MUST return a truthy node. | Required | Closes v1 P1 DRG finding. |
| FR-005 | For every research action, resolve_context(graph, f'action:research/{action}', depth=...) MUST return non-empty artifact_urns. | Required | Closes v1 P1 DRG finding. |
| FR-006 | The action-scoped doctrine bundle (under src/doctrine/missions/research/actions/<action>/) authored for each research action MUST be reachable via the DRG resolution path used by composition (not just via MissionTemplateRepository.get_action_guidelines). | Required | Closes v1 P1 DRG finding. |
| FR-007 | _check_composed_action_guard() MUST handle each of the five research actions with parity to existing software-dev guards. | Required | Closes v1 P1 guard finding. |
| FR-008 | When a research-action precondition is unmet (missing spec.md, missing plan.md, fewer than 3 source_documented events, missing findings.md, missing publication_approved gate), the guard MUST return a non-empty structured failure list naming the missing artifact or predicate. | Required | Closes v1 P1 guard finding. |
| FR-009 | _dispatch_via_composition() MUST propagate guard failures as structured errors with no run-state advancement and no legacy-DAG fallback. | Required | Inherited PR #797 invariant; preserved for research. |
| FR-010 | The MissionTemplate for research MUST satisfy the same loader path that software-dev uses today (load_mission_template → discovery tier walk). | Required | No bespoke loader for research. |
| FR-011 | Every composed research action invocation MUST record action_hint == contract.action. | Required | Inherited from executor.py:173; preserved for research. |
| FR-012 | Every profile invocation opened for a research action MUST be closed with a paired terminal record (done or failed) before the step returns. | Required | Inherited; preserved for research. |
| FR-013 | The integration test that proves SC-001 / SC-002 / SC-003 MUST drive the real runtime via get_or_start_run (or decide_next_via_runtime) and MUST NOT mock _dispatch_via_composition, StepContractExecutor.execute, frozen-template loaders, or the DRG. | Required | Closes v1 P1 bypass-test finding. |
| FR-014 | Existing software-dev composition behavior, custom mission loader behavior, and runtime bridge behavior MUST remain unchanged for inputs that already passed at the baseline commit. | Required | Regression contract. |
| FR-015 | The 5 step contracts under src/doctrine/mission_step_contracts/shipped/research-{action}.step-contract.yaml, the 5 doctrine bundles under src/doctrine/missions/research/actions/<action>/, the 5 entries in _ACTION_PROFILE_DEFAULTS, and the "research" entry in _COMPOSED_ACTIONS_BY_MISSION MUST exist (re-authored on top of the corrected substrate, not carried forward from the v1 attempt). | Required | Wholesale replacement, not patch. |
Non-Functional Requirements
| ID | Requirement | Threshold | Status |
|---|---|---|---|
| NFR-001 | Test coverage: a real-runtime integration test MUST exist for research alongside refreshed unit tests for each new map entry, contract, and doctrine surface. | At least one real-runtime walk (no mocks of the listed surfaces) plus parametrized unit tests for: contract loading, profile defaults, DRG node existence, doctrine bundle resolution, guard parity. | Required |
| NFR-002 | Existing test suites that protect the substrate MUST stay green. | 100% pass on tests/specify_cli/mission_step_contracts/, tests/specify_cli/next/test_runtime_bridge_composition.py, tests/integration/test_custom_mission_runtime_walk.py, tests/integration/test_mission_run_command.py. | Required |
| NFR-003 | mypy --strict MUST report zero new errors on changed files. | Zero new findings. Pre-existing baseline errors are not regressed. | Required |
| NFR-004 | ruff check MUST report zero new findings on changed files. | Zero new findings. | Required |
| NFR-005 | Mission-review verdict of PASS MUST require the dogfood smoke (Scenario 6) to succeed before the verdict is issued. The mission-review skill must record the smoke output as evidence. | Hard gate. PASS verdicts that omit smoke evidence are invalid. | Required |
| NFR-006 | Trail records for composed research actions MUST be operator-readable: each contains action name, profile name, and lifecycle status. | All trail records contain these three fields without internal-only identifiers. | Required |
Constraints
| ID | Constraint | Rationale | Status |
|---|---|---|---|
| C-001 | Spec Kitty MUST NOT call host LLMs or generate research findings. Research content (reading, reasoning, citation drafting, synthesis prose) is owned by the host harness. | Trust boundary preserved from v1 spec. | Required |
| C-002 | The composition chokepoint for research MUST remain StepContractExecutor. The runtime bridge MUST NOT call ProfileInvocationExecutor directly for research actions. | Inherited PR #797 architectural invariant. | Required |
| C-003 | _ACTION_PROFILE_DEFAULTS additions MUST be limited to built-in research actions. No generalization to wildcard keys or arbitrary custom missions. | Preserves PR #799 custom-loader contract. | Required |
| C-004 | Out of scope: documentation mission composition (#502), retrospective work (#506-#511), low-priority loader hygiene (#801), spec-kitty explain (#534), SaaS / tracker / sync, spec_kitty_events and spec_kitty_tracker package surfaces. | Phase 6 sequencing; package-boundary discipline. | Required |
| C-005 | The mission MUST build on PR #795 / #797 / #799 invariants. It MUST NOT re-open already-closed Phase 6 review findings. | Treat past invariants as regression risks, not open bugs. | Required |
| C-006 | If the existing legacy mission.yaml (state machine with states/transitions/guards) at src/specify_cli/missions/research/mission.yaml and src/doctrine/missions/research/mission.yaml is replaced rather than coexisting with the new MissionTemplate, the mission MUST document why and prove backward compatibility for any consumer of the legacy file. | Plan-time decision; do not implicitly orphan downstream consumers. | Required |
| C-007 | Real-runtime tests MUST NOT use unittest.mock.patch against _dispatch_via_composition, StepContractExecutor.execute, ProfileInvocationExecutor.invoke, frozen-template loaders, load_validated_graph, or resolve_context. | The point of FR-013 is to prove the live path; mocking those defeats it. | Required |
| C-008 | The mission-review skill invocation that issues the final PASS verdict MUST include explicit dogfood smoke evidence in its report. Reports without smoke evidence are downgraded to UNVERIFIED. | NFR-005 is the consequent. | Required |
Success Criteria
| ID | Outcome | Measure |
|---|---|---|
| SC-001 | A fresh research mission can be created and advanced. | From a clean checkout: spec-kitty agent mission create demo-research --mission-type research --json succeeds; subsequent spec-kitty next --agent <name> --mission <handle> returns a next-step decision without MissionRuntimeError. |
| SC-002 | Each research action has a real DRG node with non-empty resolved context. | For each of the 5 research actions, load_validated_graph(repo).get_node(f'action:research/{action}') is truthy and resolve_context(...).artifact_urns is non-empty. |
| SC-003 | Missing artifacts produce structured guard failures. | _check_composed_action_guard returns a non-empty failure list naming the missing artifact for each of the 5 actions on an empty feature directory; _dispatch_via_composition propagates the failure with no run-state advancement. |
| SC-004 | Real-runtime test passes without bypassing composition surfaces. | tests/integration/test_research_runtime_walk.py passes; grep confirms the file does not patch _dispatch_via_composition, StepContractExecutor.execute, frozen-template loaders, load_validated_graph, or resolve_context. |
| SC-005 | No regression. | All four regression suites (mission_step_contracts/, runtime_bridge_composition, custom_mission_walk, mission_run_command) pass on the merged commit. |
| SC-006 | Mission-review PASS verdict carries dogfood smoke evidence. | Mission-review report includes a "dogfood smoke" section with command output proving SC-001 from a clean repo. Without that section, the verdict is UNVERIFIED. |
Key Entities
- MissionTemplate (research) — the Pydantic-validated runtime template that the engine consumes when
mission_type='research'. Itsmission.keyisresearchand itsstepslist defines the concrete sequence ofPromptStepobjects the engine walks. New artifact in this mission. - Action node (research/X) — a DRG node whose URN is
action:research/<action>for X in{scoping, methodology, gathering, synthesis, output}. Carries the same edge shape as software-dev action nodes (VOCABULARY, SCOPE, etc.). New artifact in this mission. - Composed-action guard (research branch) — the new conditional branches inside
_check_composed_action_guard()that handle research actions and emit structured failures on unmet preconditions. New code path in this mission. - Real-runtime walk —
tests/integration/test_research_runtime_walk.pyrewritten to callget_or_start_runand assert end-to-end without mocking any composition surface. Replacement artifact in this mission. - Dogfood smoke — a hard-gated quickstart sequence that an operator (or the mission-review skill) executes to prove SC-001 against the merged code on a clean repo.
Assumptions
These will be re-confirmed in /spec-kitty.plan against the actual code; any contradicted by the audit must be resolved before tasks.
1. MissionTemplate (Pydantic) is loaded from a YAML file via load_mission_template_file() (src/specify_cli/next/_internal_runtime/schema.py:548-576). The discovery tier walk (load_mission_template, src/specify_cli/next/_internal_runtime/discovery.py:294-313) maps mission_type='research' to a YAML file on disk. The new template lands at one of those discovered locations. 2. The validated DRG (<doctrine_root>/graph.yaml plus .kittify/doctrine/graph.yaml overlay) is the consumer for action:research/* nodes. The shipped graph is hand-authored or migrated; there is no extractor that would automatically populate research nodes from the action doctrine bundles. 3. _check_composed_action_guard() (src/specify_cli/next/runtime_bridge.py:444-520) is the right surface to extend; on unrecognized (mission, action) pairs it returns an empty failure list (silent pass), which is the v1 P1 finding. 4. Software-dev's MissionTemplate exists somewhere on the discovery path with steps: list[PromptStep]; mirroring it for research is the minimum sufficient change for runnability. 5. The five research action verbs (scoping, methodology, gathering, synthesis, output) and their profile defaults (researcher-robbie x4, reviewer-renata for output) remain the right choices; the reroll preserves these from the v1 plan.
Dependencies
- Landed: PR #795 (software-dev composition), PR #797 (composition stabilization), PR #799 (local custom mission loader).
- Not blocked by: #502, #506-#511, #801. These remain downstream Phase 6 tranches.
- External: none. Self-contained inside
Priivacy-ai/spec-kitty. - Local artifacts of v1 attempt are preserved at git tag
attempt/research-composition-mission-100-brokenfor reference; the reroll does not import them but may copy verbatim from them where the v1 work was correct (contracts, doctrine prose, profile defaults).
Out of Scope
- Documentation mission composition rewrite (#502).
- Retrospective contract / lifecycle work (#506-#511).
- Local-loader hygiene (#801) unless directly co-located with reroll changes.
- SaaS / tracker / sync behavior.
spec-kitty explain(#534).spec_kitty_events,spec_kitty_tracker, or any external package-boundary surfaces.- Host-LLM-side research authorship.
- Any framework-level redesign of the runtime engine to natively support state-machine missions (the existing v1
mission.yamlmay coexist with the newMissionTemplateas long as runnability and guard parity are met; replacing it wholesale is acceptable but optional).
Open Questions
To be resolved during /spec-kitty.plan:
1. Coexistence vs replacement: does the new MissionTemplate replace the legacy src/specify_cli/missions/research/mission.yaml and its src/doctrine/missions/research/mission.yaml counterpart, or do both coexist? If they coexist, what is the read precedence in load_mission_template? 2. DRG authoring: are action:research/* nodes added to the shipped <doctrine_root>/graph.yaml, or to the project overlay, or via a calibration step that reads action bundles? Plan-time audit must answer. 3. Guard semantics: do the research guard branches in _check_composed_action_guard() enforce mission.yaml's declarative predicates (artifact_exists, event_count, gate_passed) directly, or do they re-implement the same checks against the feature directory? Plan-time decision. 4. PromptStep shape per action: each of the 5 research actions needs at least one PromptStep. What agent_profile and contract_ref (if any) does each step bind to, and how does the existing software-dev pattern translate? Plan-time decision. 5. v1 preserved artifacts: which v1 artifacts can be copied verbatim from the attempt/research-composition-mission-100-broken tag (e.g. step contract YAML files, doctrine bundles), and which must be re-authored against the corrected substrate? Plan-time decision.
Definition of Done
- All FR-### items have at least one explicit test or assertion proving them.
- All NFR-### items have a measurement or threshold check.
- All C-### items are observable in the diff or in test code.
- Every Open Question is resolved in
plan.mdwith code-grounded evidence. - All 6 Acceptance Scenarios pass against the merged code.
- The mission-review skill invocation includes dogfood smoke evidence in its report; without it, the verdict is UNVERIFIED, not PASS.
- v1's regression suites pass byte-identically.