Implementation Plan: Phase 6 Composition Stabilization
Mission: phase6-composition-stabilization-01KQ2JAS Spec: spec.md Created: 2026-04-25 Branch contract: current=main, planning_base=main, merge_target=main, branch_matches_target=true
Engineering Alignment
Narrow bug-fix tranche on three seams of the just-landed software-dev composition path. Implementation is constrained by start-here.md, the spec, and the current code shape:
- Composer stays a composer.
StepContractExecutordoes not run shell, does not run an LLM, does not pretend command steps ran (C-005). ProfileInvocationExecutoris the single invocation primitive for governance context, trail writing, and glossary checks (C-006).- Local-first trail.
.kittify/events/profile-invocations/<id>.jsonlis canonical; no remote sync added (C-007). - No second mission runner. Composition advancement reuses the primitives the legacy DAG path already uses (NFR-005).
Decisionshape is stable. Public return shape fromruntime_bridge.decide_next_via_runtime(...)is unchanged (FR-005).mission-runtime.yamlfiles stay frozen unless a regression test proves no other correct implementation (FR-004).
Summary
Three bug fixes that together restore single-dispatch semantics, paired invocation lifecycles, and contract-action recording for the live software-dev composition path:
1. #786 — runtime_bridge.py stops falling through to runtime_next_step(...) after a successful composition dispatch; instead, a new _advance_run_state_after_composition(...) helper performs run-state, lane, and prompt progression and a Decision is returned directly. 2. #794 — ProfileInvocationExecutor.invoke(...) gains a keyword-only action_hint: str | None = None parameter that, when truthy alongside profile_hint, replaces the role-default-verb derivation. StepContractExecutor.execute(...) passes action_hint=selected_contract.action. 3. #793 — StepContractExecutor.execute(...) wraps each composed step in try/except/else and calls complete_invocation(payload.invocation_id, outcome="done"|"failed") to close every invocation it starts.
Technical Context
Language/Version: Python 3.13 (charter, uv run --python 3.13 ...) Primary Dependencies: existing only — typer, rich, ruamel.yaml, pytest, mypy --strict Storage: filesystem only — .kittify/events/profile-invocations/*.jsonl (existing local-first trail) Testing: pytest with the focused command from start-here.md; ≥90% coverage for new code (charter) Target Platform: spec-kitty CLI (cross-platform Python) Project Type: single project (CLI library) Performance Goals: no perceptible change; bug-fix tranche Constraints: see C-001 through C-011 in spec.md Scale/Scope: 3 source files touched, 4 test files extended; ~3 work packages
NEEDS CLARIFICATION
None. All open points were resolved against the source (see research.md).
Charter Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
Charter loaded for plan (bootstrap mode):
- DIRECTIVE_003 (Decision Documentation) — applied. Each material design decision is recorded here with rationale (see "Implementation Approach" sections per issue) and in
research.md. - DIRECTIVE_010 (Specification Fidelity) — applied. Every plan section maps to FR/NFR/C IDs in
spec.md. No deviations from the spec. - Tactic: requirements-validation-workflow — applied via FR→test mapping below.
- Tactic: premortem-risk-identification — applied (see "Risks & Pre-mortems").
- Charter policy summary:
pytest≥90% coverage for new code,mypy --strictclean,ruffclean — gated as NFR-002/NFR-003/NFR-004. - Charter Check status: PASS. No conflicts with charter; no Complexity Tracking violations.
Phase 0 — Research Summary
Full notes in research.md. Key resolved decisions:
| Decision | Choice | Rationale |
|---|---|---|
complete_invocation(...) outcome value for composed steps | "done" on success, "failed" on exception | Matches existing Literal["done", "failed", "abandoned"] set in invocation/record.py:34; "abandoned" is reserved for user-initiated cancellation. The "completion does not imply host LLM did the requested generation" semantic is captured in tests/comments per start-here.md. |
| Where to break the legacy fall-through | In decide_next_via_runtime(...) after _dispatch_via_composition(...) returns | Composition success returns a Decision directly; non-composed actions still fall through to runtime_next_step(...). Minimizes blast radius into the legacy path. |
| How to advance run-state without re-running legacy DAG action | New _advance_run_state_after_composition(...) helper in runtime_bridge.py reusing the same primitives runtime_next_step(...) uses for state/lane/prompt progression | Brief explicitly suggests this; satisfies NFR-005. |
Where to add action_hint parameter | `ProfileInvocationExecutor.invoke(..., *, action_hint: str | None = None)` (keyword-only) |
Empty-string action_hint semantics | Treat as None (legacy fallback) | EDGE-005 in spec; conservative. |
StepContractExecutor try/except shape | Wrap each invoke()+per-step body in try/except/else; close with "done" on success, "failed" on exception, then re-raise | Closes lifecycle on both paths; preserves error propagation through the existing Decision error shape. |
| Frozen files | mission-runtime.yaml (both copies), tasks.step-contract.yaml, record.py, writer.py, modes.py | FR-004 holds; the existing fixed tasks guard is keyed by legacy_step_id and does not need to change. |
Phase 1 — Design Artifacts
- data-model.md — entities and invariants for invocation lifecycle pairing, action-hint widening, and the single-dispatch invariant.
- contracts/invocation_executor_invoke.md —
ProfileInvocationExecutor.invoke(...)contract. - contracts/runtime_bridge_dispatch.md — runtime-bridge dispatch contract.
- contracts/step_contract_executor_lifecycle.md —
StepContractExecutor.execute(...)lifecycle contract. - quickstart.md — operator quickstart: how to verify the three behaviors locally.
Project Structure
Documentation (this feature)
kitty-specs/phase6-composition-stabilization-01KQ2JAS/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│ ├── invocation_executor_invoke.md
│ ├── runtime_bridge_dispatch.md
│ └── step_contract_executor_lifecycle.md
├── checklists/
│ └── requirements.md
└── tasks/ # populated by /spec-kitty.tasks
Source Code (repository root)
src/specify_cli/
├── next/
│ └── runtime_bridge.py # (touched) FR-001, FR-002, FR-005
├── mission_step_contracts/
│ └── executor.py # (touched) FR-008, FR-014
├── invocation/
│ ├── executor.py # (touched) FR-009..FR-013
│ ├── writer.py # (read-only)
│ ├── record.py # (read-only)
│ └── modes.py # (read-only)
└── ... rest of tree unchanged ...
tests/specify_cli/
├── next/
│ └── test_runtime_bridge_composition.py # (extended)
├── mission_step_contracts/
│ └── test_software_dev_composition.py # (extended)
└── invocation/
├── test_invocation_e2e.py # (extended)
└── test_writer.py # (touched only if writer surface changes — it does not)
Structure Decision: Single Python project. Existing layout is preserved; no new packages or modules.
Implementation Approach (per issue)
#786 — Stop legacy DAG fall-through (FR-001 / FR-002 / FR-003 / FR-004 / FR-005)
Where: src/specify_cli/next/runtime_bridge.py.
Current behavior: _dispatch_via_composition(...) returns None on success (line 485); decide_next_via_runtime(...) then falls through to runtime_next_step(...) (lines 986–991), which advances run state AND executes the legacy DAG dispatch handler — effectively double-dispatching the action.
Fix:
1. Make _dispatch_via_composition(...) return a Decision (not None) on the success path. The new Decision carries the same shape the legacy path produces, constructed from a new helper _advance_run_state_after_composition(...). 2. The new helper reuses the same primitives runtime_next_step(...) uses internally for: emitting lane status events (via the same SyncRuntimeEventEmitter), recording mission-state advancement, and computing the next public step. It does not call any legacy DAG action handler. 3. Update decide_next_via_runtime(...) so that when _dispatch_via_composition(...) returns a non-None Decision, that Decision is returned immediately without falling through to runtime_next_step(...). 4. Preserve the fixed tasks guard semantics exactly (FR-003): the guard already runs in _check_composed_action_guard(...) AFTER composition runs but BEFORE this advancement helper. Keep that ordering. 5. Public Decision shape and the legacy non-composed code path are unchanged (FR-005).
Tests (extend tests/specify_cli/next/test_runtime_bridge_composition.py):
- Negative condition (FR-015): For each composed action (
specify,plan,tasks,implement,review), patch the legacy DAG dispatch entry point and assert it is not called when_dispatch_via_composition(...)succeeds. - Run-state still advances (FR-002): Assert lane events are emitted and the returned
Decisionreflects progression to the next public step. Decisionshape unchanged (FR-005): Snapshot theDecisionfield set against a reference dict.tasksguard semantics preserved (FR-003): existingtasks_outline/tasks_packages/tasks_finalizeguard tests must continue to pass without modification.- Non-composed actions regression (EDGE-002): Assert legacy
runtime_next_step(...)is still called for an action that is NOT in the composition allow-list. - Composition success but advancement raises (EDGE-003): Patch the new helper to raise; assert the raised error is surfaced through the existing
Decisionerror shape and that legacy dispatch is not entered as a fallback.
#794 — Preserve contract action with profile_hint (FR-009 – FR-014)
Where: src/specify_cli/invocation/executor.py (signature change), src/specify_cli/mission_step_contracts/executor.py (call site update).
Fix in invocation/executor.py:
1. Extend invoke(...) signature with a keyword-only action_hint: ``python def invoke( self, request_text: str, profile_hint: str | None = None, actor: str = "unknown", mode_of_work: ModeOfWork | None = None, *, action_hint: str | None = None, ) -> InvocationPayload: ` 2. Inside the if profile_hint is not None: branch, replace the unconditional derivation with: `python if action_hint: action = action_hint else: action = self._derive_action_from_request(request_text, profile.role) ` 3. The router-backed branch is unchanged. 4. The InvocationRecord(action=action, ...) shape is unchanged — only the value supplied to action changes when action_hint is set. 5. Governance context assembly already reads from the record's action`, so FR-013 follows for free.
Fix in mission_step_contracts/executor.py:
6. In StepContractExecutor.execute(...), pass action_hint=selected_contract.action to every invoke(...) call (FR-014).
Tests:
- Hint preserved (FR-010, Scenario D): parametrized over
{specify, plan, tasks, implement, review}; assert started JSONLaction == <key>and payload exposes the same. - Legacy fallback (FR-011, Scenario E):
invoke(profile_hint=...)withoutaction_hintkeeps the derived role-default verb. - Empty-string
action_hint(EDGE-005):invoke(profile_hint=..., action_hint="")falls back to derivation. - No regression for non-
profile_hintcallers (FR-012): existingadvise/ask/dotests must pass unchanged. - End-to-end (FR-014): composed
software-dev/specifyaction recordsaction="specify", not the role-default verb.
#793 — Close invocation lifecycles (FR-006 / FR-007 / FR-008 / EDGE-001 / EDGE-004)
Where: src/specify_cli/mission_step_contracts/executor.py.
Fix:
1. In StepContractExecutor.execute(...), wrap each composed step's invoke(...) + post-invoke body in try/except/else: ``python payload = self._invocation_executor.invoke( ..., action_hint=selected_contract.action, ) try: # existing per-step body ... except Exception: self._invocation_executor.complete_invocation( payload.invocation_id, outcome="failed", ) raise else: self._invocation_executor.complete_invocation( payload.invocation_id, outcome="done", ) ` 2. Use only complete_invocation(...) — no direct JSONL writes (FR-008). 3. One-line code comment at the close site captures the "completion does not imply host LLM did the requested generation" semantic. 4. EDGE-004 (multiple composed steps): the try/except is per-step, so each invocation pairs independently. 5. EDGE-001 (mid-step exception): failed close runs before raise; the runtime bridge surfaces the error via the existing Decision` error shape.
Tests:
- Success closes (FR-006, Scenario B): every JSONL file produced by a composed action has paired
started+completed(outcome="done"). - Failure closes (FR-007, Scenario C): per-step body patched to raise; JSONL has
started+failed; original exception still propagates. - No orphan started-only files: per-action sweep asserts paired counts.
- Multi-step pairing (EDGE-004): composed action with ≥2 invocations; every invocation paired independently.
- Lifecycle uses public API only (FR-008): monkey-patch
complete_invocationandInvocationWriter.write_*; assert onlycomplete_invocationis reached from the executor.
Test Strategy
Coverage Map (FR → tests)
| FR | Test file(s) | Test name (proposed) |
|---|---|---|
| FR-001, FR-015 | test_runtime_bridge_composition.py | test_composition_success_skips_legacy_dispatch[<action>] (parametrized) |
| FR-002 | test_runtime_bridge_composition.py | test_composition_success_advances_run_state_and_lane_events |
| FR-003 | test_runtime_bridge_composition.py | existing test_tasks__guard_ (must pass unchanged) |
| FR-004 | implicit — diff inspection + ruff/mypy | |
| FR-005 | test_runtime_bridge_composition.py | test_decision_shape_unchanged_for_composed_action |
| FR-006, FR-016 | test_invocation_e2e.py, test_software_dev_composition.py | test_composed_action_pairs_started_with_completed |
| FR-007, EDGE-001 | test_invocation_e2e.py | test_composed_step_failure_writes_failed_completion |
| FR-008 | test_software_dev_composition.py | test_executor_uses_complete_invocation_api_only |
| FR-009, FR-010 | test_invocation_e2e.py | test_invoke_with_action_hint_and_profile_hint_records_hint[<action>] |
| FR-011, EDGE-005 | test_invocation_e2e.py | test_invoke_profile_hint_only_falls_back_to_derived_action, test_invoke_empty_action_hint_falls_back |
| FR-012 | existing advise/ask/do tests (must pass unchanged) | |
| FR-013 | test_software_dev_composition.py | test_governance_context_uses_contract_action_when_hint_supplied |
| FR-014 | test_software_dev_composition.py | test_step_contract_executor_passes_action_hint |
| FR-017 | covered above |
Verification commands
uv sync --python 3.13 --extra test
uv run --python 3.13 --extra test python -m pytest \
tests/specify_cli/next/test_runtime_bridge_composition.py \
tests/specify_cli/mission_step_contracts/test_software_dev_composition.py \
tests/specify_cli/invocation/test_invocation_e2e.py \
tests/specify_cli/invocation/test_writer.py \
-q
uv run --python 3.13 python -m ruff check \
src/specify_cli/next/runtime_bridge.py \
src/specify_cli/mission_step_contracts/executor.py \
src/specify_cli/invocation/executor.py \
tests/specify_cli/next/test_runtime_bridge_composition.py \
tests/specify_cli/mission_step_contracts/test_software_dev_composition.py \
tests/specify_cli/invocation/test_invocation_e2e.py
uv run --python 3.13 python -m mypy --strict \
src/specify_cli/next/runtime_bridge.py \
src/specify_cli/mission_step_contracts/executor.py \
src/specify_cli/invocation/executor.py
Risks & Pre-mortems
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Composition advancement helper accidentally re-emits a duplicate lane event already emitted by the composer. | Medium | Medium | Pre-implementation: enumerate which events the composer / guard already emit; test asserts each event fires exactly once per action. |
complete_invocation("done") semantic looks wrong to a future reader who reads it as "host LLM finished generation". | Medium | Low | One-line code comment at the close site referencing this plan + spec; explicit test name test_composed_action_outcome_is_done_even_though_composition_does_not_run_llm. |
Adding action_hint to invoke() collides with a downstream caller passing positional args. | Low | Low | Keyword-only (* separator); existing positional callers untouched; mypy --strict catches drift. |
tasks guard ordering changes when we restructure dispatch flow. | Medium | High | Treat _check_composed_action_guard(...) call site as fixed; new advancement helper appended after it, never before; existing tasks_* guard tests must pass without modification. |
| "No direct JSONL writes from executor" assertion is brittle if implemented as code-grep. | Low | Low | Use a behavioral monkey-patch test that verifies only complete_invocation is the close path. |
Composition fails AFTER per-step body but BEFORE the else branch (e.g. exception in else itself). | Very low | Medium | The else branch contains only a single complete_invocation(..., "done") call; if it raises, that is the existing writer/IO failure mode. We do not nest a second try/except. |
_derive_action_from_request behavior changes silently because we added the guard. | Low | Low | Only triggered when action_hint is truthy; legacy path is byte-identical. Snapshot test on the legacy path. |
Migration / Compatibility
- Backwards compatible. The
action_hintkwarg defaults toNone; every existing caller continues to work unchanged. - Trail format unchanged. No new JSONL fields, no new outcome values, no new file naming.
- No
mission-runtime.yamledits. Packaged runtime collapse to the publictasksstep (landed in PR #795) is preserved. - No
.kittify/charter/edits. - No CLI surface changes.
Out of Scope (re-affirmed from spec)
#505custom mission loader (C-009).#787charter context compact mode regression.- Research / documentation mission rewrites (
#504,#502). - Retrospective work (
#506–#511), Phase 7 (#469),spec-kitty explain, Phase 4/5 reopen. - Edits to
spec-kitty-events,spec-kitty-runtime,src/spec_kitty_events/,.kittify/charter/.
Complexity Tracking
No Charter Check violations. Section intentionally empty.
Branch Contract (re-stated)
- Current branch at plan start:
main - Planning / base branch:
main - Final merge target:
main branch_matches_target:true- Summary: "Current branch at workflow start: main. Planning/base branch for this feature: main. Completed changes must merge into main."
Next Step
Run /spec-kitty.tasks to materialize the WP outline. Expected split:
- WP01 —
#786runtime bridge single-dispatch +_advance_run_state_after_compositionhelper (must land first; blocker for#505). - WP02 —
#794action_hintkwarg onProfileInvocationExecutor.invoke(...)+StepContractExecutorcall-site update (lands before WP03 so completion records carry correct action). - WP03 —
#793invocation lifecycle close inStepContractExecutor.execute(...).
Dependency graph: WP01 independent; WP02 independent; WP03 depends on WP02.