Checklists
requirements.md
Specification Quality Checklist: Research Mission Composition Rewrite v2
Purpose: Validate specification completeness and quality before proceeding to planning Created: 2026-04-26 Feature: spec.md
Content Quality
- ✅ Implementation-level identifiers (
MissionTemplate,_check_composed_action_guard,load_validated_graph,resolve_context) appear deliberately. Audience is spec-kitty contributors and the v1 external review surfaced these as the broken surfaces that must be fixed; using them as domain terms here is precision, not over-specification. - ✅ Focused on operator-observable outcomes (mission starts, advances via composition, guards block missing artifacts, mission-review carries dogfood smoke evidence).
- ✅ Audience is appropriate: spec-kitty contributors and reviewers. Domain Language section makes terms explicit.
- ✅ All mandatory sections completed: Purpose, User Scenarios & Testing, Requirements (FR / NFR / C), Success Criteria, Key Entities, Assumptions, Dependencies, Out of Scope, Open Questions, Definition of Done.
Requirement Completeness
- ✅ No
[NEEDS CLARIFICATION]markers remain. Plan-time questions are listed in Open Questions with explicit instructions to resolve at plan time. - ✅ Requirements are testable and unambiguous. Each FR cites either an observable runtime invariant or a code surface; each NFR cites a measurable threshold or hard gate.
- ✅ Requirement types are separated: Functional (
FR-001..FR-015), Non-Functional (NFR-001..NFR-006), Constraints (C-001..C-008). - ✅ IDs are unique.
- ✅ All requirement rows include a non-empty Status value.
- ✅ Non-functional requirements include measurable thresholds (test pass-rate, mypy/ruff zero-finding, hard gate on dogfood smoke).
- ✅ Success criteria are measurable (SC-001 —
get_or_start_runsucceeds; SC-002 —get_nodetruthy andartifact_urnsnon-empty for all 5 actions; SC-003 — guard returns non-empty failure on empty dir; SC-004 —grepconfirms no mocks of named surfaces; SC-005 — 4 regression suites pass; SC-006 — mission-review carries smoke evidence). - ✅ Success criteria are technology-agnostic at the outcome level: operator can start and advance a mission, guards fire structured errors, mission-review carries evidence.
- ✅ All acceptance scenarios are defined: six Given/When/Then scenarios covering runnability, DRG resolution, guard parity, real-runtime test, no-regression, dogfood smoke.
- ✅ Edge cases are identified: composed-step exception, doctrine bundle without DRG node, two consecutive actions sharing a profile, future sixth research action.
- ✅ Scope is clearly bounded by Out of Scope and constraints C-004 / C-005 / C-007.
- ✅ Dependencies and assumptions identified.
Feature Readiness
- ✅ All functional requirements have clear acceptance criteria — Acceptance Scenarios 1–6 and Success Criteria SC-001..SC-006 map back to FR-001..FR-015.
- ✅ User scenarios cover primary flows (full research run via composition) and edge cases (guard failures, exception during composition, missing DRG node).
- ✅ Feature meets measurable outcomes defined in Success Criteria.
- ✅ No implementation details leak past the intended audience. Implementation choices on coexistence-vs-replacement, DRG authoring approach, and PromptStep binding live in the plan, not the spec.
Plan-Time Decisions Logged (Open Questions)
These are intentionally deferred to /spec-kitty.plan:
- □ Resolved at plan time — Coexistence vs replacement of legacy
mission.yaml. - □ Resolved at plan time — DRG authoring approach (shipped graph vs overlay vs calibration).
- □ Resolved at plan time — Guard semantics (delegate to mission.yaml predicates vs re-implement against feature dir).
- □ Resolved at plan time —
PromptStepshape per research action. - □ Resolved at plan time — Which v1 artifacts (from
attempt/research-composition-mission-100-brokentag) are copied verbatim vs re-authored.
Lessons baked in from the v1 attempt
- Mission-review must include dogfood smoke (NFR-005, C-008, SC-006). Without smoke evidence in the mission-review report, the verdict is UNVERIFIED, not PASS. Codified as a constraint, not a recommendation.
- Plan-time audit must probe runnability and DRG, not just file shape. FR-004 / FR-005 / FR-006 explicitly require DRG node existence and non-empty
artifact_urns, not justMissionTemplateRepository.get_action_guidelinesreturning content. - Composed-action guard surface is mission-keyed. FR-007 / FR-008 explicitly require
_check_composed_action_guard()to handle research actions with parity to software-dev. Edge case explicitly covers a future sixth action: guards must fail closed. - Real-runtime test is non-negotiable. C-007 prohibits mocks against the listed surfaces; FR-013 requires the live path. The "PARTIAL" label v1 used for FR-007/FR-008 is not available in v2.
Notes
- Spec adheres to charter directives
DIRECTIVE_003(decision documentation — Open Questions are explicit) andDIRECTIVE_010(specification fidelity — Success Criteria fix observable runtime invariants). change_mode: feature_additionis correct: this work introduces new identifiers (MissionTemplate file, action graph nodes, new guard branches, new test) rather than renaming existing ones.- Premortem: top failure modes — (a) authoring a
MissionTemplatethat loads but skips composition because steps don't bind to step contracts, (b) adding action nodes to the graph but missing edges thatresolve_contextwalks, (c) guard branches that delegate to mission.yaml predicates without preserving structured-error wording, (d) integration test that callsget_or_start_runbut mocks the runtime engine internally — each is covered by an FR/NFR. - Plan-Time Decision items are intentionally unchecked; they unblock at
/spec-kitty.plan.