Implementation Plan: Phase 4 Closeout — Host-Surface Breadth and Trail Follow-On
Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Mission ID: 01KPWA5X6617T5TVX4C7S6TMYB Branch: main (current) | Date: 2026-04-23 | Spec: spec.md Input: Feature specification from /Users/robert/spec-kitty-dev/charter-apr-23/spec-kitty/kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/spec.md Baseline commit: eb32cf0a8118856de9a59eec2635ddda0b956edf on origin/main (Phase 4 core + 3.2.0a5 closeout slice landed)
Branch Contract
- Current branch at plan start:
main - Planning / base branch:
main - Final merge target:
main branch_matches_target:true- Completed changes merge back into
main. No branch changes during planning.
Summary
Close out the remaining Phase 4 follow-on work as one combined mission with two strictly-ordered tranches:
- Tranche A — #496 host-surface breadth tail. Bring the remaining 13 slash-command + 2 Agent Skills host surfaces to parity with the landed advise/ask/do governance-injection contract, fix shipped misalignments (dashboard user-visible
Feature→Mission Runwording, CLI / README / doc wording sweep), and promote a durable host-surface parity matrix todocs/host-surface-parity.md. - Tranche B — #701 trail follow-on. Add four narrow, additive capabilities on top of the shipped trail contract: append-only correlation links (
commit_link,artifact_link) on the invocation JSONL, runtime derivation ofmode_of_workfrom the CLI entry command, mode-aware enforcement at the Tier 2 promotion boundary, and a typed SaaS read-model policy module. Tier 2 evidence stays local-only in 3.2.x — documented decisively indocs/trail-model.md.#534/spec-kitty explainremains deferred.
Execution is A → B strict. The smallest next chunk is Tranche A's dashboard wording fix (scoped to src/specify_cli/dashboard/templates/index.html, .../dashboard/static/dashboard/dashboard.js, and .../dashboard/diagnostics.py), preceded by the host-surface inventory matrix that drives the rest of Tranche A.
Technical Context
Language/Version: Python 3.11+ (existing spec-kitty requirement). Primary Dependencies: Existing toolchain only — typer (CLI), rich (console), ruamel.yaml (YAML frontmatter), python-ulid (invocation IDs), pytest (tests), mypy --strict (types). No new third-party dependency is added. Storage: Filesystem-only. Append-only JSONL at .kittify/events/profile-invocations/<invocation_id>.jsonl; existing evidence at .kittify/evidence/<invocation_id>/; existing invocation index at .kittify/events/invocation-index.jsonl; new documentation at docs/trail-model.md (updated in place) and docs/host-surface-parity.md (new). Testing: pytest with ≥ 90 % line coverage on new code (NFR-004). Mutation-aware coverage is encouraged for new invocation modules but not required as a gate. Headless run via PWHEADLESS=1 pytest tests/. Target Platform: Linux + macOS developer environments (unchanged). Project Type: Single-project Python CLI (unchanged — no frontend/backend split). Performance Goals: NFR-001 — P95 started-event write ≤ 5 ms, no new blocking I/O before started write. NFR-002 — spec-kitty invocations list --json P95 ≤ 200 ms at 10,000 JSONL files. Constraints: Local-first invocation trail (C-002); additive propagation only (C-003); append-only JSONL (C-004); host-LLM ownership unchanged (C-001); no new top-level CLI command (C-008); no broad Feature rename (C-007); #534 stays deferred (C-005); no new third-party dependency (C-009). Scale/Scope: ~15 host surfaces × 3 parity dimensions for Tranche A; 4 new/extended invocation modules + 1 policy module + 1 CLI flag extension + ~6 documentation updates for Tranche B. Estimated ~12 work packages total (5 Tranche A + 7 Tranche B), formalised by /spec-kitty.tasks.
Charter Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
Charter present at .kittify/charter/charter.md. Action doctrine loaded for plan (bootstrap mode, first-load).
| Gate | Source | Evaluation | Status |
|---|---|---|---|
| Host-LLM ownership immutable | C-001 + architecture policy | Plan introduces no LLM invocation inside Spec Kitty; all reading/generation stays host-side. | PASS |
| Local-first trail invariant | C-002 + _propagate_one sync-gate | Plan keeps Tier 1 writes unconditional; SaaS policy is read-side only, never blocks Tier 1. | PASS |
| Additive propagation | C-003 + emitter contract | New events (artifact_link, commit_link) are additive new lines; new fields (mode_of_work) are additive on existing event shape. | PASS |
| Append-only JSONL | C-004 + InvocationWriter | write_started stays exclusive-create; new append_correlation_link uses append mode; no existing line mutated. | PASS |
#534 stays deferred | C-005 | Plan contains no spec-kitty explain design or scaffolding. | PASS |
| No Phase 5 glossary rework | C-006 + mission 094 landed | Plan does not touch chokepoint, DRG, or glossary term registry. | PASS |
No broad Feature rename | C-007 | Scope is the 3 dashboard files only; backend identifiers preserved per FR-004. | PASS |
| No new top-level CLI command | C-008 | Correlation ships as flags on existing profile-invocation complete; no new verb. | PASS |
| Tooling conformance | C-009 | Plan uses typer/rich/ruamel.yaml/pytest/mypy only. | PASS |
| DIRECTIVE_003 decision capture | Charter plan doctrine | Four ADRs defined (correlation, mode derivation, SaaS policy shape, Tier 2 resolution) — see decisions/ plan in Phase 1. | PASS |
| DIRECTIVE_010 spec fidelity | Charter plan doctrine | Plan traces each FR/NFR/C back to a concrete implementation target; deviations explicitly called out below (none). | PASS |
Tactic application notes:
- Problem decomposition: tranche split + WP sketch (Section "Project Structure" below) decomposes the mission into 12 independently tractable WPs.
- Premortem risk identification: applied in
research.md(Failure Modes section). - ADR drafting workflow: applied for the four material decisions.
- Requirements validation workflow: every FR is traced to an evidence/acceptance surface in
research.mdandcontracts/.
Charter Check: PASS — no violations, no Complexity Tracking entries needed.
Project Structure
Documentation (this feature)
kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/
├── plan.md # This file
├── spec.md # Mission specification
├── research.md # Phase 0 output — decisions D1..D10 with rationale
├── data-model.md # Phase 1 output — InvocationRecord, correlation events, ModeOfWork, ProjectionRule, inventory schema
├── quickstart.md # Phase 1 output — operator walkthrough of new capabilities
├── contracts/
│ ├── profile-invocation-complete.md # CLI contract for extended `complete` with --commit / --artifact
│ ├── projection-policy.md # Python module contract for projection_policy.py
│ └── host-surface-inventory.md # Inventory matrix schema + example row
├── decisions/
│ ├── ADR-001-correlation-contract.md # D1 decision record
│ ├── ADR-002-mode-derivation.md # D2 decision record
│ ├── ADR-003-projection-policy.md # D4 decision record
│ └── ADR-004-tier2-saas-deferral.md # D5 decision record
├── artifacts/
│ └── host-surface-inventory.md # Tranche A living matrix (promoted to docs/ at closeout)
├── checklists/
│ └── requirements.md # Spec quality checklist (already complete)
├── tasks/ # Populated by /spec-kitty.tasks (not this command)
└── status.events.jsonl # Status event log (runtime-managed)
Source Code (repository root)
Work touches only files already committed to main; there is no new package directory.
src/specify_cli/
├── invocation/
│ ├── executor.py # MODIFIED (Tranche B): mode derivation on started event; read mode on complete_invocation for enforcement
│ ├── writer.py # MODIFIED (Tranche B): append_correlation_link() — new append method for artifact_link/commit_link events
│ ├── record.py # MODIFIED (Tranche B): InvocationRecord gains optional `mode_of_work` field (null-tolerant)
│ ├── errors.py # MODIFIED (Tranche B): add InvalidModeForEvidenceError
│ ├── propagator.py # MODIFIED (Tranche B): consult PROJECTION_POLICY before envelope build
│ ├── projection_policy.py # NEW (Tranche B): typed module — ModeOfWork + ProjectionRule + POLICY_TABLE + resolve_projection()
│ └── modes.py # NEW (Tranche B): ModeOfWork enum + derive_mode(entry_command) helper
├── dashboard/
│ ├── templates/index.html # MODIFIED (Tranche A): user-visible Feature → Mission Run wording (5 strings)
│ ├── static/dashboard/dashboard.js # MODIFIED (Tranche A): user-visible wording + "Unknown feature" fallback (~6 strings)
│ ├── static/dashboard/dashboard.css # UNCHANGED — class names .feature-selector etc. stay per FR-004
│ └── diagnostics.py # MODIFIED (Tranche A): "no feature context" → "no mission context"
└── cli/
└── commands/
└── profile_invocation.py # MODIFIED (Tranche B): add --commit and --artifact flags to `complete` subcommand
docs/
├── trail-model.md # MODIFIED: new subsections — "Mode of Work (runtime-enforced)", "Correlation Links", "SaaS Read-Model Policy", "Tier 2 SaaS Projection — deferred"
└── host-surface-parity.md # NEW (Tranche A closeout): promoted matrix
src/doctrine/skills/
├── spec-kitty-runtime-next/SKILL.md # VERIFY PARITY (Tranche A) — already has Standalone Invocations section
└── (13 slash-command agent skill-pack mirrors per CLAUDE.md AGENT_DIRS)
# TOUCHED per agent config — VERIFY PARITY (Tranche A)
.agents/skills/
└── spec-kitty.advise/SKILL.md # VERIFY PARITY (Tranche A) — already has Governance context injection section
tests/specify_cli/invocation/
├── test_invocation_e2e.py # EXTENDED (Tranche B): add correlation-link write/read; mode enforcement; projection policy
├── test_projection_policy.py # NEW (Tranche B): parameterised table test
├── test_modes.py # NEW (Tranche B): derive_mode() mapping test
└── test_correlation.py # NEW (Tranche B): append_correlation_link + path-normalisation
tests/specify_cli/dashboard/
└── test_dashboard_wording.py # NEW (Tranche A): asserts user-visible mission wording in template/js/diagnostics
Structure Decision: Single-project Python CLI (existing layout). All new runtime code lands in src/specify_cli/invocation/ next to existing modules; dashboard wording changes are scoped to three files; docs gain one new operator-facing file plus additions to trail-model.md. No restructuring.
Work Package Sketch (finalised by /spec-kitty.tasks)
Tranche A (strict predecessor of Tranche B): 1. WP01 — Host-surface inventory matrix (builds the parity matrix at artifacts/host-surface-inventory.md; determines A3–A5 scope). 2. WP02 — Dashboard user-visible wording fix (index.html, dashboard.js, diagnostics.py) — the smallest-next chunk once WP01 confirms scope. 3. WP03 — Rendering-contract consistency sweep (CLI help strings for advise/ask/do/complete/list, README governance section, cross-doc wording). 4. WP04 — Skill-pack parity rollout to the host surfaces WP01 flags as non-parity. Each surface either receives in-surface parity content or an explicit pointer, per FR-002 / FR-006. 5. WP05 — Promote artifacts/host-surface-inventory.md to docs/host-surface-parity.md; close #496.
Tranche B (starts only after WP05 is approved):
6. WP06 — Mode derivation: add src/specify_cli/invocation/modes.py; record mode_of_work on started event; backfill null-tolerance on read paths. 7. WP07 — Correlation contract: InvocationWriter.append_correlation_link(); extend profile-invocation complete with repeatable --artifact and singular --commit; artifact-ref normalisation to repo-relative when under checkout, absolute fallback otherwise. 8. WP08 — Mode enforcement at promotion: InvalidModeForEvidenceError; complete_invocation reads started-event mode and rejects --evidence on advisory/query. 9. WP09 — SaaS read-model policy: src/specify_cli/invocation/projection_policy.py as a typed module; wire _propagate_one through resolve_projection(); docs/trail-model.md policy table. 10. WP10 — Tier 2 SaaS projection — keep local-only: update docs/trail-model.md with the "Tier 2 SaaS Projection — deferred" subsection and reasoning; no code change (D5 decision). 11. WP11 — Migration note + CHANGELOG entry (FR-013). 12. WP12 — Tracker hygiene execution: close #496 at WP05 merge; close #701, update #466, cross-link #534 to #499/#759 at mission merge (FR-014). CLI-level executor for this WP is a scripted checklist run by the release owner — no code change.
Task dependencies are straightforward (A→B strict; within each tranche roughly linear except WP03 can run in parallel with WP04 once WP01 is complete). /spec-kitty.tasks will finalise lanes.
Complexity Tracking
Fill ONLY if Charter Check has violations that must be justified
No Charter Check violations. No complexity entries required.
Phase 0 — Outline & Research
Goal: Resolve every design unknown identified during the planning interrogation, grounded in the live main baseline and the spec.
Research methodology:
- Inspect landed code at
eb32cf0aforinvocation/,dashboard/, propagator sync-gate behaviour, andcomplete_invocationpromotion path. - Cross-reference GitHub issues
#496,#701,#534,#461,#466for acceptance language. - Cross-reference CLAUDE.md
AGENT_DIRSand the per-agent directory conventions for Tranche A host-surface inventory. - Apply the premortem tactic (from
planaction doctrine) to surface failure modes for each of the 10 design decisions.
Output: research.md — full D1–D10 resolution with rationale, alternatives considered, rejected options, failure modes, and acceptance-surface mapping per FR.
Every [NEEDS CLARIFICATION] slot is resolved in research.md. None remains open.
Phase 1 — Design & Contracts
Prerequisite: research.md complete.
Design artifacts to generate:
1. data-model.md — Typed data-model for the new/extended entities:
InvocationRecord(extended): new optionalmode_of_work: ModeOfWork | Nonefield onstartedevents; unchangedcompletedschema; null-tolerant on read for pre-mission records.ModeOfWorkenum:advisory,task_execution,mission_step,query. Exhaustive derivation mapping from CLI entry command.- Correlation events:
{event: "artifact_link", invocation_id, kind, ref, at}and{event: "commit_link", invocation_id, sha, at}. Ref-normalisation rule: repo-relative when the resolved absolute path is under the repo root, absolute fallback otherwise. ProjectionRuledataclass:{project: bool, include_request_text: bool, include_evidence_ref: bool}.POLICY_TABLE: dict[tuple[ModeOfWork, EventKind], ProjectionRule]— the backing table for the lookup helper.HostSurfaceInventoryRow:{surface_key, directory, has_advise_guidance, has_governance_injection, has_completion_guidance, guidance_style: "inline" | "pointer", notes}.
2. contracts/:
profile-invocation-complete.md— CLI contract for the extendedcompletesubcommand: flag semantics, exit codes, error shapes, examples. Specifies that--artifactis repeatable and--commitis singular. Specifies ref-normalisation: canonicalise to repo-relative when the resolved path is underrepo_root; otherwise record the absolute path as-provided.projection-policy.md— Python module contract forsrc/specify_cli/invocation/projection_policy.py:ModeOfWork+EventKind+ProjectionRule+POLICY_TABLE+resolve_projection(mode, event) -> ProjectionRule. Import surface, typed signatures, invariants.host-surface-inventory.md— Inventory-matrix schema + one worked example row so Tranche A WP01 can be executed mechanically.
3. quickstart.md — Operator walkthrough that exercises, end-to-end: (a) invoking spec-kitty do "…" → reading mode_of_work in the JSONL → closing with --artifact and --commit → listing the same invocation with its correlation links; (b) attempting Tier 2 promotion on an advisory invocation and seeing the rejection; (c) reading the SaaS projection policy table from docs/trail-model.md; (d) confirming propagation-errors.jsonl stays empty with sync disabled.
4. decisions/ — Four ADRs, each ~40–80 lines, using the adr-drafting-workflow tactic format (context, decision, rationale, alternatives considered, consequences, revisit trigger):
ADR-001-correlation-contract.md(D1)ADR-002-mode-derivation.md(D2)ADR-003-projection-policy.md(D4)ADR-004-tier2-saas-deferral.md(D5)
Charter re-check after Phase 1: re-evaluated below (same 11 gates), still PASS after artifacts are authored. Re-check is recorded in the footer of this plan at report time.
Post-Phase-1 Charter Re-check
| Gate | Status after design |
|---|---|
| Host-LLM ownership immutable | PASS — no LLM calls added in any design artifact. |
| Local-first trail invariant | PASS — resolve_projection() is read-side; propagator still short-circuits on routing.effective_sync_enabled before policy lookup. |
| Additive propagation | PASS — mode_of_work is additive-null; correlation events are additive new lines; POLICY_TABLE does not remove or reshape existing fields. |
| Append-only JSONL | PASS — append_correlation_link uses append; write_started / write_completed unchanged. |
#534 stays deferred | PASS — no explain surface in any artifact. |
| No Phase 5 glossary rework | PASS — glossary modules untouched. |
No broad Feature rename | PASS — design limits wording change to the three identified dashboard files. |
| No new top-level CLI command | PASS — correlation ships as flags, not a new verb. |
| Tooling conformance | PASS — typer/rich/ruamel.yaml/pytest/mypy only. |
| DIRECTIVE_003 decision capture | PASS — four ADRs drafted in decisions/. |
| DIRECTIVE_010 spec fidelity | PASS — every FR/NFR/C maps to a concrete artifact surface in data-model.md or contracts/. |
Risks and Mitigations (from Premortem Tactic)
Applied to each design decision; full narrative in research.md Section "Failure Modes and Premortem".
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Dashboard wording bleed beyond the three files | Low | Medium | WP01 inventory must list every user-visible Feature string; WP02 tests assert on specific template/JS/diagnostics strings; plan commits wording grep in the test. |
Mode enforcement false-positive (legitimate task_execution misclassified as advisory) | Low | Medium | D2 derivation is from CLI entry command, not routed action — deterministic and documented. Parameterised test (WP06) covers every entry command. |
| Correlation event confuses old readers | Low | Low | Readers already skip unknown event values (see glossary_checked precedent in writer.py); contract documents that artifact_link/commit_link are additive. |
resolve_projection() bug silently drops Tier 1 propagation | Low | High | NFR-007 test asserts zero errors with sync disabled; contract test asserts default behaviour for (task_execution, started) projects exactly as today. |
| Ref-normalisation breaks for symlinked repos or worktrees | Medium | Low | Use Path.resolve().is_relative_to(repo_root.resolve()) pattern (already used in complete_invocation for evidence ref); WP07 tests include worktree + symlink cases. |
| Tracker hygiene forgotten at merge | Medium | Low | WP12 is a dedicated WP and is part of Definition of Done (FR-014); checklist in quickstart.md operator section. |
Artifact Index
- Plan:
kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/plan.md - Spec:
kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/spec.md - Research:
kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/research.md - Data model:
kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/data-model.md - Quickstart:
kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/quickstart.md - Contracts:
kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/contracts/ - Decision records:
kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/decisions/ - Checklists:
kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/checklists/requirements.md
Branch contract (restated): current main, planning base main, merge target main, branch_matches_target=true.