Implementation Plan: Mission Retrospective Learning Loop

Mission ID: 01KQ6YEGT4YBZ3GZF7X680KQ3V (mid8: 01KQ6YEG) Mission slug: mission-retrospective-learning-loop-01KQ6YEG Branch contract: planning base main → final merge main (matches target ✅) Date: 2026-04-27 Spec: ./spec.md Checklists: ./checklists/requirements.md, ./checklists/release-gate.md Issues covered: #468 (epic), #507, #506, #508, #509, #511, #510


Summary

Spec Kitty already governs how missions run. This tranche makes it learn from every mission it runs.

The implementation introduces a single new package, src/specify_cli/retrospective/, that owns the governance contract: retrospective.yaml schema, atomic writer, mode detection, the eight retrospective events, the lifecycle gate, and the cross-mission summary read model. Doctrine, DRG, and glossary mutation stay where they belong — behind a synthesizer hook in the doctrine area (src/specify_cli/doctrine/synthesizer/) that the retrospective package calls when staged proposals are approved. The lifecycle gate is implemented once in specify_cli.retrospective.gate and consulted thinly from specify_cli.next (the canonical control loop) and from any status transition surface that needs mission-level mode policy. Operators invoke the retrospective lifecycle via spec-kitty next (built-in missions reach the retrospect action through a lifecycle terminus hook; custom missions reach the same action through their existing required retrospective marker step). They read findings via spec-kitty retrospect summary. They apply staged proposals via spec-kitty agent retrospect synthesize.

Eight new event names are defined locally in specify_cli.retrospective.events for this tranche, with an upstream PR opened in parallel against the external spec_kitty_events PyPI package. Once the upstream release lands, this tranche switches imports and removes the local module. No prompt-builder filtering is added; per-mission action surface calibration outcomes are expressed strictly as DRG edge changes in src/doctrine/graph.yaml and project-local graph overlays.


Technical Context

Language/Version: Python 3.11+ (existing spec-kitty codebase).

Primary Dependencies (all already adopted):

  • pydantic v2 — schema for retrospective.yaml, proposals, events.
  • ruamel.yaml — round-trip-safe YAML writer.
  • typer — CLI surface for spec-kitty retrospect summary and spec-kitty agent retrospect synthesize.
  • rich — operator-facing report rendering.
  • pytest — testing, including real-runtime integration tests.
  • mypy --strict — type checking, charter requirement.
  • spec_kitty_events (external PyPI) — eventual home of the eight retrospective events; consumed via the public boundary once upstream release lands.

Storage: filesystem only.

  • Per-mission durable: .kittify/missions/<mission_id>/retrospective.yaml.
  • Per-mission events: appended to the existing kitty-specs/<slug>/status.events.jsonl event log used by other lifecycle events (FR-018).
  • Project-local doctrine/DRG/glossary surfaces remain where they already live (src/doctrine/graph.yaml, project overlays under .kittify/).

Testing:

  • Unit tests for schema validation, writer round-trip, mode detection, gate decisions, summary reduction.
  • Real-runtime integration tests that drive a fixture mission through autonomous and HiC terminus paths end-to-end (FR-033). No private-helper-only acceptance.
  • Architectural test asserting no prompt-builder filtering call sites are added (C-011).
  • Property/fuzz test for malformed retrospective.yaml corpus tolerance (NFR-004).

Target Platform: macOS / Linux developer environments and CI; same surface as the rest of the spec-kitty CLI.

Project Type: single — Python CLI package with embedded subsystems.

Performance Goals (from spec NFRs, restated for measurement environment):

  • NFR-001: schema validation of a typical (≤200 findings) retrospective.yaml < 200 ms on a developer laptop (warm interpreter, SSD).
  • NFR-003: cross-mission summary on a 200-mission corpus < 5 s on a developer laptop.
  • NFR-007: gate adds < 500 ms to mission completion when retrospective.completed is already present.

Constraints (mapped from spec):

  • C-006: no imports from the retired spec_kitty_runtime package; only the CLI-internal runtime under src/specify_cli/next/_internal_runtime/ is in scope.
  • C-007: use canonical typed Lane, WPState, status emit/reduce primitives.
  • C-011: calibration outcomes are DRG edges only — no prompt-builder filtering.
  • C-013: charter override is sovereign in mode detection.
  • C-014: canonical path is keyed by mission_id (ULID), not mission_number.

Scale/Scope:

  • Single project at a time; per-project mission corpus expected up to ~200 missions for the lifetime of this tranche's design assumptions.
  • Eight new event names; ~1 new top-level CLI command (retrospect) with two subcommands; ~1 new agent subcommand (retrospect synthesize); 1 new package; 1 new module under doctrine/synthesizer/.

Charter Check

Gate: must pass before Phase 0 research. Re-checked after Phase 1 design.

Charter source: .kittify/charter/charter.md. Action doctrine for plan includes:

Charter itemConformanceEvidence
typer / rich / ruamel.yaml / pytest / mypy strict / 90%+ coverage✅ PassAll listed deps already adopted; this plan adds no new toolchain. NFR-009/NFR-010 explicitly bind us to ≥90% coverage and mypy --strict.
DIRECTIVE_003 (Decision Documentation Requirement)✅ PassThis plan documents Q1–Q4 architecture decisions inline with rationale; an ADR will be drafted under architecture/2.x/adr/ for the gate-ownership decision (Q1-C).
DIRECTIVE_010 (Specification Fidelity Requirement)✅ PassAll FR/NFR/C IDs from spec.md are mapped to concrete artifacts in §Phase 1 / §Module Plan below. Any deviation will be captured in research.md decision records.
Action tactic: premortem-risk-identificationApplied§Risk Premortem section below enumerates failure modes and mitigations.
Action tactic: problem-decompositionApplied§Module Plan decomposes the tranche into seven independent sub-problems already aligned with the suggested WP shape in start-here.md.
Action tactic: ADR drafting workflowAppliedOne ADR planned for the gate-ownership decision (Q1-C), justifying the new shared module instead of reusing existing seams.

Re-check after Phase 1: ✅ no new violations introduced. No Complexity Tracking entries required.


Architecture Decisions (locked from planning interrogation)

These decisions answer the four planning questions and govern Phase 0/Phase 1.

AD-001 — Lifecycle gate is a single shared module (Q1-C)

The retrospective gate logic lives in src/specify_cli/retrospective/gate.py. It exposes a small typed API (e.g., is_completion_allowed(mission_id, mode, …) returning a structured GateDecision). The two callers stay thin:

  • src/specify_cli/next/ calls gate.is_completion_allowed(...) immediately before signaling mission completion.
  • Any status transition surface that ever needs to refuse mission-level completion calls the same function.

Mission-level policy does not live inside WP-level status code (specify_cli.status.transitions). That module already encodes per-WP transition guards; mission-level mode policy would be the wrong layering.

Why not (A): WP-level transition guards aren't the right place for mission-mode policy. Why not (B): Burying the gate in next would require status-transition callers to reach into runtime internals. Why (C): Keeps the gate as one source of truth without lying to either existing seam. ADR will be drafted.

AD-002 — Retrospective package layout (Q2-C)

src/specify_cli/retrospective/
├── __init__.py            # Public API surface
├── schema.py              # Pydantic models for retrospective.yaml + findings + proposals
├── writer.py              # Atomic round-trip-safe YAML writer (NFR-002)
├── reader.py              # Schema-validating reader; structured-error on malformed
├── mode.py                # Mode detection: charter > flag > env > parent process (FR-016)
├── events.py              # Eight event Pydantic models, factory, names; LOCAL until upstream lands (Q4-C)
├── gate.py                # Lifecycle gate (AD-001)
├── lifecycle.py           # Terminus-hook integration point invoked by `next`
├── summary.py             # Cross-mission summary reducer + report renderer
└── cli.py                 # `spec-kitty retrospect summary` typer surface

Doctrine/DRG/glossary mutation stays in:

src/specify_cli/doctrine/
└── synthesizer/
    ├── __init__.py
    ├── apply.py            # Applies accepted proposals (auto + staged-then-approved)
    ├── conflict.py         # Conflict detection between staged proposals (FR-023)
    └── provenance.py       # Provenance metadata writer (FR-022)

The retrospective package depends on doctrine.synthesizer only at the point of retrospect synthesize. The schema/writer/gate/summary do not depend on doctrine internals.

AD-003 — CLI surfaces (Q3-C)

Backed by specify_cli.retrospective.cli. Emits both human-readable Rich output and a structured JSON artifact (FR-025).

Backed by a new module under src/specify_cli/cli/commands/agent_retrospect.py that wires through doctrine.synthesizer.apply. The default is --dry-run; --apply is opt-in to make staged-application explicit (FR-021).

  • Operator-facing read (top-level): spec-kitty retrospect summary [--project <path>] [--json] [--limit N].
  • Agent-facing mutation (under agent): spec-kitty agent retrospect synthesize --mission <handle> [--apply | --dry-run].

The retrospect action / retrospective-facilitator profile (FR-001, FR-002) are DRG artifacts, not CLI commands; they are surfaced via the existing profile/action lookup and consumed through spec-kitty next. Operators who want to invoke the retrospective explicitly can run spec-kitty next --agent <name> --mission <handle> and the runtime drives the action through the same DRG context as any other action.

AD-004 — Event-package boundary (Q4-C)

The eight retrospective events (FR-017) are first defined locally in specify_cli.retrospective.events so this tranche stays self-contained. In parallel, an upstream PR is opened against spec_kitty_events to add the same eight events with the same names and payload shapes. Once the upstream release lands:

1. specify_cli.retrospective.events switches its imports to from spec_kitty_events import retrospective as ev. 2. The local Pydantic models are deleted. 3. pyproject.toml bumps the spec_kitty_events version pin.

A boundary test (under tests/architectural/test_shared_package_boundary.py) is added that asserts: once the upstream release is consumed, no Pydantic model for retrospective events lives outside spec_kitty_events.*. Until then, the test is skipped with a clear "pending upstream release" reason and a TODO comment naming the upstream issue.

This tranche's acceptance does not require the upstream release to land. It requires the local-then-upstream migration path to be documented in research.md and held to in code.


Project Structure

Documentation (this feature)

kitty-specs/mission-retrospective-learning-loop-01KQ6YEG/
├── spec.md                  # /spec-kitty.specify output (already committed)
├── meta.json                # Mission identity (already committed)
├── plan.md                  # This file
├── research.md              # Phase 0 output (this command)
├── data-model.md            # Phase 1 output (this command)
├── quickstart.md            # Phase 1 output (this command)
├── contracts/               # Phase 1 output (this command)
│   ├── retrospective_yaml_v1.md   # retrospective.yaml schema contract
│   ├── retrospective_events_v1.md # Eight event names + payload contracts
│   ├── gate_api.md                # Lifecycle gate Python API contract
│   ├── synthesizer_hook.md        # doctrine.synthesizer hook contract
│   └── cli_surfaces.md            # CLI surfaces contract
├── checklists/
│   ├── requirements.md            # /specify self-validation (already committed)
│   └── release-gate.md            # /checklist release-gate (already committed)
├── tasks.md                 # Phase 2 output (/spec-kitty.tasks command — NOT created here)
└── tasks/                   # Phase 2 output (/spec-kitty.tasks command — NOT created here)

Source code (repository root)

New code in this tranche, all under spec-kitty/src/:

src/specify_cli/
├── retrospective/                              # NEW package (AD-002)
│   ├── __init__.py
│   ├── schema.py
│   ├── writer.py
│   ├── reader.py
│   ├── mode.py
│   ├── events.py
│   ├── gate.py
│   ├── lifecycle.py
│   ├── summary.py
│   └── cli.py
├── doctrine/                                   # Existing area; NEW synthesizer subpackage
│   └── synthesizer/                            # NEW subpackage (AD-002)
│       ├── __init__.py
│       ├── apply.py
│       ├── conflict.py
│       └── provenance.py
├── cli/commands/
│   └── agent_retrospect.py                     # NEW: `spec-kitty agent retrospect synthesize`
├── next/
│   └── _internal_runtime/
│       └── retrospective_hook.py               # NEW thin caller into retrospective.lifecycle/gate
└── status/
    └── (no changes; gate is consulted via specify_cli.retrospective.gate)
src/doctrine/
└── graph.yaml                                  # Existing; calibration tranche edits edges here

.kittify/
└── missions/<mission_id>/
    └── retrospective.yaml                      # NEW canonical durable retrospective record
tests/
├── retrospective/
│   ├── test_schema_roundtrip.py
│   ├── test_writer_atomicity.py
│   ├── test_mode_detection.py
│   ├── test_gate_decision.py
│   ├── test_summary_tolerance.py
│   ├── test_events_shapes.py
│   └── test_cli_surfaces.py
├── doctrine/synthesizer/
│   ├── test_apply.py
│   ├── test_conflict_failclosed.py
│   └── test_provenance.py
├── integration/retrospective/
│   ├── test_autonomous_terminus_e2e.py         # FR-033 acceptance
│   ├── test_hic_terminus_e2e.py                # FR-033 acceptance
│   └── test_silent_skip_blocked.py             # Negative case (FR-012)
└── architectural/
    └── test_no_prompt_filtering_added.py       # C-011 enforcement
architecture/2.x/adr/
└── 2026-04-27-1-retrospective-gate-shared-module.md  # AD-001 ADR

docs/
├── retrospective-learning-loop.md              # Operator overview
└── migration/
    └── retrospective-events-upstream.md        # AD-004 cutover runbook

Structure decision: single-project Python CLI; the retrospective package owns the governance contract and the synthesizer subpackage owns doctrine/DRG/glossary mutation. No worktrees during planning; planning artifacts are committed on main.


Module Plan (decomposition for /spec-kitty.tasks)

This decomposition is non-binding for plan but maps cleanly to the suggested WP shape in start-here.md and to the spec's FR/NFR/C IDs. Tasks will refine boundaries.

Sub-problemSpec coverageSurfaces touched
1. Profile + action + DRG contractFR-001, FR-002, FR-003, FR-004src/doctrine/graph.yaml, profile/action artifacts
2. Schema + writer + reader + eventsFR-005, FR-006, FR-007, FR-008, FR-009, FR-017, FR-018, NFR-001, NFR-002, NFR-005retrospective/{schema,writer,reader,events}.py
3. Mode detection + lifecycle gateFR-011, FR-012, FR-013, FR-014, FR-015, FR-016, NFR-007, NFR-008, C-013retrospective/{mode,gate,lifecycle}.py, thin caller in next/_internal_runtime/retrospective_hook.py
4. Synthesizer hook + provenanceFR-019, FR-020, FR-021, FR-022, FR-023, FR-024, NFR-006, C-012doctrine/synthesizer/{apply,conflict,provenance}.py, cli/commands/agent_retrospect.py
5. Cross-mission summaryFR-025, FR-026, FR-027, NFR-003, NFR-004retrospective/{summary,cli}.py
6. Action-surface calibration reports + DRG edge changesFR-030, FR-031, FR-032, C-011src/doctrine/graph.yaml, calibration report markdown under architecture/calibration/ (one per mission)
7. Real-runtime integration gate + dogfood across built-in and ERP custom missionsFR-028, FR-029, FR-033, NFR-009, NFR-010, C-001 through C-010tests/integration/retrospective/, tests/architectural/, fixture missions

Risk Premortem (charter tactic: premortem-risk-identification)

Failure modes considered and mitigation chosen now (vs. deferred to plan-time addenda):

RiskSeverityMitigation in plan
Drift between event names and payloadsHighContracts file contracts/retrospective_events_v1.md pins names and payload field minimums before implementation; schema tests validate.
Synthesizer staleness (staged proposal applied against moved doctrine)MediumSynthesizer reads target doctrine state at apply time and re-runs conflict detection (FR-023); a staleness check refuses apply if the cited evidence event ids are no longer reachable.
Calibration churn (fixing §4.5.1 for one step regresses another)MediumEach calibration report includes per-step before/after evidence; calibration tests are per-mission, not aggregate.
Mode misattribution (CI looks autonomous when operator intended HiC)MediumSource signal is recorded in the retrospective record and in mission events (FR-016, FR-018); operator can audit via spec-kitty retrospect summary --json.
Privacy of evidence referencesMediumEvidence event ids treated as opaque references; schema disallows embedding event payload content; documented in contracts/retrospective_yaml_v1.md.
Upstream spec_kitty_events release slipsLowLocal module compiles cleanly; cutover is mechanical; cutover runbook lives in docs/migration/retrospective-events-upstream.md. Acceptance does not block on upstream.
Staged-proposal queue grows unboundedLowRetrospective record stores proposals in-line with status; there is no separate queue store. Cross-mission summary surfaces stale proposals so they are visible.

These risks are also reflected in research.md as decision records.


Phase 0: Research

See ./research.md. Phase 0 closes every research task created from spec ambiguities and from the architecture decisions above. After research closes, no [NEEDS CLARIFICATION] markers remain in plan.md or any Phase 1 artifact.

Research tasks executed:

  • R-001 Mode-detection precedence: charter > flag > env > parent — formal lookup order, observable signals, and audit trail format.
  • R-002 Canonical retrospective path resolution rules under the post-083 mission identity model (mission_id-keyed).
  • R-003 Atomic-write strategy on POSIX filesystems for retrospective.yaml; tempfile + os.replace semantics; behavior on partial writes.
  • R-004 Append-only event log integration: how retrospective events join status.events.jsonl without breaking the existing reducer.
  • R-005 Action-surface inequality (architecture §4.5.1) — concrete predicate suitable for calibration tests.
  • R-006 Conflict predicates between paired proposal kinds (e.g., add_edge(E) vs. remove_edge(E), update_glossary_term(T) vs. add_glossary_term(T)).
  • R-007 Upstream spec_kitty_events boundary test pattern (skipped-until-released).
  • R-008 Cross-mission summary reduction strategy: streaming, tolerance to malformed records (NFR-004), 200-mission corpus performance target.

Phase 1: Design & Contracts

Phase 1 outputs derive entirely from confirmed spec + research. Files produced:

  • ./data-model.md — entities, fields, validation rules, state transitions for retrospective record, finding, proposal, mode, gate decision.
  • ./contracts/retrospective_yaml_v1.mdretrospective.yaml schema contract (required vs. optional fields, required provenance, status enum, proposal types).
  • ./contracts/retrospective_events_v1.md — eight event names and payload contracts (FR-017).
  • ./contracts/gate_api.md — lifecycle gate Python API contract (AD-001).
  • ./contracts/synthesizer_hook.md — doctrine.synthesizer hook contract; provenance fields; conflict failclosed shape.
  • ./contracts/cli_surfaces.mdspec-kitty retrospect summary and spec-kitty agent retrospect synthesize contracts (input flags, exit codes, output shape).
  • ./quickstart.md — operator quickstart: how a HiC run looks, how an autonomous run looks, how to skip with audit trail, how to read summary, how to apply staged proposals.

Charter re-check after Phase 1: ✅ no new violations.


Bulk-Edit Detection

Not applicable. This tranche adds new identifiers, files, and DRG edges; it does not rename an existing string across many files. meta.json does not get change_mode: bulk_edit. No occurrence_map.yaml is required.

If calibration (sub-problem 6) ends up renaming an existing DRG edge identifier, that single rename is local to src/doctrine/graph.yaml and would not constitute a bulk edit.


Acceptance Mapping

Every spec acceptance gate maps to a concrete artifact below. (Plan-time map; tasks will refine.)

Acceptance Gate (spec)Plan artifact / module
1. profile + action exist in DRGsrc/doctrine/graph.yaml (sub-problem 1)
2. schema validates and round-tripsretrospective/schema.py + retrospective/writer.py + tests
3. canonical durable pathretrospective/writer.py writes to .kittify/missions/<mission_id>/retrospective.yaml
4. autonomous blocks until completedretrospective/gate.py + next/_internal_runtime/retrospective_hook.py
5. HiC offers + permits explicit skipretrospective/lifecycle.py + gate
6. silent auto-run impossible (HiC)gate refuses non-explicit invocations in HiC; integration test
7. silent skip impossible (autonomous)gate; integration test
8. synthesized changes from finding setdoctrine/synthesizer/apply.py + cli/commands/agent_retrospect.py
9. provenance references sourcedoctrine/synthesizer/provenance.py
10. later mission sees updated contextcovered by integration test that runs a follow-up mission against post-synthesis project state
11. summary handles rich/brief/skipped/missing/malformedretrospective/summary.py + tolerance tests
12. calibration reports for 4 missionsarchitecture/calibration/<mission>.md × 4
13. calibration changes are DRG-onlyarchitectural test enforces no new prompt-filter call sites (C-011)
14. existing built-in mission tests passregression-only; no removal of existing tests
15. existing custom-mission loader tests passregression-only; no change to loader contract; FR-029 keeps marker requirement
16. real-runtime integration tests drive lifecycletests/integration/retrospective/*

Complexity Tracking

No charter-violation justifications required. This tranche reuses the project's existing dependency set, adds one new package and one new subpackage, and introduces no new third-party tools.


Open Items Carried Into /spec-kitty.tasks

These are intentional plan-time hand-offs (not unresolved clarifications):

  • Per-proposal-type payload field minimums beyond the contract baseline. The contract pins required envelope fields per proposal kind; sub-fields like "what does synthesize_directive's body shape look like" can be refined when the first WP implements it. Captured in research.md R-006 with the rule: payload schema must be explicit before the WP merging it lands.
  • Calibration report template format (markdown column shape). One representative template will be drafted in the calibration WP and cloned across the four in-scope missions.

Branch Contract Restated (final)

  • Current branch at plan completion: main.
  • Planning/base branch for this feature: main.
  • Final merge target: main.
  • branch_matches_target: ✅ true.

Plan complete after Phase 1 outputs land. Do not run /spec-kitty.tasks automatically. The user invokes it explicitly when ready to break this plan into work packages.