Tasks: Phase 4 Closeout — Host-Surface Breadth and Trail Follow-On

Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Spec: spec.md Plan: plan.md Baseline commit: eb32cf0a on origin/main (2026-04-23)

Branch Contract

  • Current branch at tasks start: main
  • Planning / base branch: main
  • Final merge target: main
  • branch_matches_target: true

Execution Summary

Tranche A (host-surface breadth, 5 WPs) ships first. Tranche B (trail follow-on, 4 WPs) begins only after Tranche A is approved.

Smallest next chunk to build first: WP01 (inventory matrix), followed immediately by WP02 (dashboard wording fix) which can run in parallel with WP03 and WP04.


Subtask Index

Reference table only — status is tracked via per-WP checkboxes below.

IDDescriptionWPParallel
T001Scaffold host-surface inventory matrix fileWP01
T002Audit slash-command surfaces (13) for advise/ask/do parityWP01[D]
T003Audit Agent Skills surfaces (Codex, Vibe)WP01[D]
T004Populate inventory rows + parity_status + notesWP01
T005Replace user-visible Feature strings in dashboard/templates/index.htmlWP02
T006Replace user-visible Feature strings in dashboard/static/dashboard/dashboard.jsWP02
T007Replace "no feature context" in dashboard/diagnostics.pyWP02
T008Write wording + backend-preservation snapshot testWP02
T009Live dashboard visual verificationWP02
T010Audit + update README.md governance layer subsectionWP03
T011Terminology consistency pass on .agents/skills/spec-kitty.advise/SKILL.mdWP03[D]
T012Terminology consistency pass on src/doctrine/skills/spec-kitty-runtime-next/SKILL.mdWP03[D]
T013Snapshot test asserting README governance subsection structureWP03
T014Link-target regression test for canonical skill-pack pointersWP03
T015Establish the parity-file pattern (inline vs pointer) for each non-parity surfaceWP04
T016Ship parity content for surface group 1 (copilot, gemini, cursor, qwen)WP04[D]
T017Ship parity content for surface group 2 (opencode, windsurf, kilocode)WP04[D]
T018Ship parity content for surface group 3 (auggie, roo, q, kiro, agent)WP04[D]
T019Update inventory matrix to reflect post-rollout parity statusWP04
T020Promote inventory to docs/host-surface-parity.md with preambleWP05
T021Add link from docs/trail-model.md to the promoted matrixWP05
T022Verify link from README governance section to promoted matrixWP05
T023Add parity-coverage test tests/specify_cli/docs/test_host_surface_inventory.pyWP05
T024Mark #496 as delivered in the tracker-hygiene checklist for WP09WP05
T025Wire merge-ready signal: all Tranche A tests greenWP05
T026Create modes.py with ModeOfWork + derive_mode() + unit testsWP06
T027Extend InvocationRecord with optional mode_of_work; executor threads kwargWP06
T028Add append_correlation_link() + shared normalise_ref() to writer.pyWP06
T029Add InvalidModeForEvidenceError; enforce in complete_invocationWP06
T030CLI wiring for advise/ask/do/complete (advise.py, do_cmd.py) — mode + correlation flagsWP06
T031CLI wiring for query + mission-step paths (profiles_cmd.py, invocations_cmd.py, next_cmd.py)WP06
T032Integration tests: e2e + correlation + enforcement + backwards compatWP06
T033Create projection_policy.py with EventKind, ProjectionRule, POLICY_TABLE, resolve_projection()WP07
T034Modify _propagate_one to consult resolve_projection; gate envelope field inclusionWP07
T035Unit tests for all 16 policy rows + null-mode fallbackWP07
T036Integration tests: propagator under mocked client; each (mode, event) pairWP07
T037NFR-007 / SC-008 assertion: propagation-errors.jsonl empty under sync-disabledWP07
T038Golden-path regression: existing task_execution / mission_step timeline behaviour preservedWP07
T039Add "Mode of Work (runtime-enforced)" subsection to docs/trail-model.mdWP08
T040Add "Correlation Links" subsection to docs/trail-model.mdWP08
T041Add "SaaS Read-Model Policy" subsection + full table to docs/trail-model.mdWP08
T042Add "Tier 2 SaaS Projection — Deferred" subsection to docs/trail-model.mdWP08
T043Update CHANGELOG.md unreleased section with Tranche A + Tranche B summaries + migration noteWP08
T044Add doc-presence test tests/specify_cli/docs/test_trail_model_doc.pyWP08
T045Prepare #496 close comment + close on Tranche A deliveryWP09
T046Prepare #701 close comment + close on mission mergeWP09
T047Update #466 (Phase 4 tracker) — Phase 4 follow-on shippedWP09
T048Cross-link #534 to #499 / #759 as Phase 5 glossary-foundation unblockerWP09
T049Verify #461 umbrella roadmap left openWP09
T050Retitle #496 to reflect delivered scope if neededWP09
T051Document completed hygiene actions in the PR descriptionWP09

Totals: 9 work packages, 51 subtasks. Average size ~5.7 subtasks / WP. No WP exceeds 7 subtasks.


Tranche A — Host-Surface Breadth (#496)

WP01 — Host-Surface Inventory Matrix

Goal: Produce the authoritative parity matrix across all 15 supported host surfaces.

Priority: Tranche A foundation.

Independent test: Running the WP01 deliverable produces a complete matrix file; every AGENT_DIRS key has exactly one row.

Included subtasks:

  • ✅ T001 Scaffold host-surface inventory matrix file
  • ✅ T002 Audit slash-command surfaces (13) for advise/ask/do parity
  • ✅ T003 Audit Agent Skills surfaces (Codex, Vibe)
  • ✅ T004 Populate inventory rows + parity_status + notes

Dependencies: none. Execution mode: planning_artifact. Owned files: kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/artifacts/host-surface-inventory.md. Estimated prompt size: ~300 lines. Prompt: tasks/WP01-host-surface-inventory.md.

WP02 — Dashboard User-Visible Wording Fix

Goal: Replace user-visible Feature strings in the three dashboard files with Mission Run vocabulary. Backend identifiers stay unchanged per FR-004.

Priority: Tranche A — the smallest code-touching chunk.

Independent test: Open the dashboard in a browser; no user-visible Feature string remains on the mission selector, current-mission header, breadcrumbs, or empty state.

Included subtasks:

  • ✅ T005 Replace user-visible Feature strings in dashboard/templates/index.html
  • ✅ T006 Replace user-visible Feature strings in dashboard/static/dashboard/dashboard.js
  • ✅ T007 Replace "no feature context" in dashboard/diagnostics.py
  • ✅ T008 Write wording + backend-preservation snapshot test
  • ✅ T009 Live dashboard visual verification

Dependencies: WP01 (inventory captures this surface). Execution mode: code_change. Owned files: src/specify_cli/dashboard/templates/index.html, src/specify_cli/dashboard/static/dashboard/dashboard.js, src/specify_cli/dashboard/diagnostics.py, tests/specify_cli/dashboard/test_dashboard_wording.py. Estimated prompt size: ~400 lines. Prompt: tasks/WP02-dashboard-wording-fix.md.

WP03 — README + Canonical Skills Terminology Sweep

Goal: Add a Governance layer subsection to README.md; audit + correct vocabulary in the two canonical skill packs (.agents/skills/spec-kitty.advise/SKILL.md and src/doctrine/skills/spec-kitty-runtime-next/SKILL.md) for consistency with the Phase 4 runtime vocabulary and with WP02's Mission Run rename.

Priority: Tranche A — parallel with WP02 and WP04.

Independent test: README renders a Governance layer subsection with links to docs/trail-model.md and docs/host-surface-parity.md; no stale Feature terminology in either canonical skill pack where the concept is a Mission Run.

Included subtasks:

  • ✅ T010 Audit + update README.md governance layer subsection
  • ✅ T011 Terminology consistency pass on .agents/skills/spec-kitty.advise/SKILL.md
  • ✅ T012 Terminology consistency pass on src/doctrine/skills/spec-kitty-runtime-next/SKILL.md
  • ✅ T013 Snapshot test asserting README governance subsection structure
  • ✅ T014 Link-target regression test for canonical skill-pack pointers

Dependencies: WP01 (inventory identifies terminology gaps). Execution mode: code_change. Owned files: README.md, .agents/skills/spec-kitty.advise/SKILL.md, src/doctrine/skills/spec-kitty-runtime-next/SKILL.md, tests/specify_cli/docs/test_readme_governance.py. Estimated prompt size: ~350 lines. Prompt: tasks/WP03-readme-and-canonical-skills.md.

WP04 — Skill-Pack Parity Rollout to Remaining Agent Surfaces

Goal: Bring the 12 non-canonical host surfaces (all slash-command agents except claude which is covered by the runtime-next doctrine skill) to parity with the advise/ask/do governance-injection contract. Either inline content or a pointer per-surface, documented in the inventory.

Priority: Tranche A — parallel with WP02 and WP03.

Independent test: Every non-canonical surface contains either inline parity content or an explicit pointer file; the inventory matrix shows parity_status=at_parity for all 15 surfaces after WP04 closes.

Included subtasks:

  • ✅ T015 Establish the parity-file pattern (inline vs pointer) for each non-parity surface
  • ✅ T016 Ship parity content for surface group 1 (copilot, gemini, cursor, qwen)
  • ✅ T017 Ship parity content for surface group 2 (opencode, windsurf, kilocode)
  • ✅ T018 Ship parity content for surface group 3 (auggie, roo, q, kiro, agent)
  • ✅ T019 Update inventory matrix to reflect post-rollout parity status

Dependencies: WP01 (inventory scope drives the rollout). Execution mode: code_change. Owned files: .github/prompts/spec-kitty-standalone.md, .gemini/commands/spec-kitty-standalone.md, .cursor/commands/spec-kitty-standalone.md, .qwen/commands/spec-kitty-standalone.md, .opencode/command/spec-kitty-standalone.md, .windsurf/workflows/spec-kitty-standalone.md, .kilocode/workflows/spec-kitty-standalone.md, .augment/commands/spec-kitty-standalone.md, .roo/commands/spec-kitty-standalone.md, .amazonq/prompts/spec-kitty-standalone.md, .kiro/prompts/spec-kitty-standalone.md, .agent/workflows/spec-kitty-standalone.md. Estimated prompt size: ~450 lines. Prompt: tasks/WP04-skill-pack-parity-rollout.md.

WP05 — Inventory Promotion + Tranche A Closeout

Goal: Promote the living matrix to docs/host-surface-parity.md, add links from docs/trail-model.md and README, ship the coverage test, and mark Tranche A ready for merge.

Priority: Tranche A closeout — must run after WP02, WP03, WP04 all merge.

Independent test: docs/host-surface-parity.md exists with every AGENT_DIRS surface listed; tests/specify_cli/docs/test_host_surface_inventory.py passes green; the promoted doc is linked from docs/trail-model.md and README.

Included subtasks:

  • ✅ T020 Promote inventory to docs/host-surface-parity.md with preamble
  • ✅ T021 Add link from docs/trail-model.md to the promoted matrix
  • ✅ T022 Verify link from README governance section to promoted matrix
  • ✅ T023 Add parity-coverage test tests/specify_cli/docs/test_host_surface_inventory.py
  • ✅ T024 Mark #496 as delivered in the tracker-hygiene checklist for WP09
  • ✅ T025 Wire merge-ready signal: all Tranche A tests green

Dependencies: WP02, WP03, WP04. Execution mode: code_change. Owned files: docs/host-surface-parity.md, tests/specify_cli/docs/test_host_surface_inventory.py. Estimated prompt size: ~400 lines. Prompt: tasks/WP05-inventory-promotion-tranche-a-closeout.md.


Tranche B — Trail Follow-On (#701)

WP06 — Trail Enrichment: Mode Derivation + Correlation + Enforcement

Goal: Implement three coupled changes inside the invocation runtime: (1) runtime derivation of mode_of_work from the CLI entry command, recorded on the started event; (2) append-only correlation events (artifact_link, commit_link) on the invocation JSONL, driven by new flags on profile-invocation complete; (3) mode-aware enforcement that rejects Tier 2 evidence promotion for advisory / query invocations with a typed error.

Priority: Tranche B foundation.

Independent test: End-to-end — open each of advise/ask/do, verify the started event carries the expected mode_of_work; close a task_execution invocation with --artifact a --artifact b --commit sha, verify the two correlation events + one commit_link append in order; attempt --evidence on an advisory invocation and verify InvalidModeForEvidenceError is raised before any append. All local-first invariants preserved.

Included subtasks:

  • ✅ T026 Create modes.py with ModeOfWork + derive_mode() + unit tests
  • ✅ T027 Extend InvocationRecord with optional mode_of_work; executor threads kwarg
  • ✅ T028 Add append_correlation_link() + shared normalise_ref() to writer.py
  • ✅ T029 Add InvalidModeForEvidenceError; enforce in complete_invocation
  • ✅ T030 CLI wiring for advise/ask/do/complete (advise.py, do_cmd.py) — mode + correlation flags
  • ✅ T031 CLI wiring for query + mission-step paths (profiles_cmd.py, invocations_cmd.py, next_cmd.py)
  • ✅ T032 Integration tests: e2e + correlation + enforcement + backwards compat

Dependencies: WP05 (Tranche A must close before Tranche B begins). Execution mode: code_change. Owned files: src/specify_cli/invocation/modes.py, src/specify_cli/invocation/record.py, src/specify_cli/invocation/writer.py, src/specify_cli/invocation/executor.py, src/specify_cli/invocation/errors.py, src/specify_cli/cli/commands/advise.py, src/specify_cli/cli/commands/do_cmd.py, src/specify_cli/cli/commands/next_cmd.py, src/specify_cli/cli/commands/profiles_cmd.py, src/specify_cli/cli/commands/invocations_cmd.py, tests/specify_cli/invocation/test_modes.py, tests/specify_cli/invocation/test_correlation.py, tests/specify_cli/invocation/test_invocation_e2e.py. Estimated prompt size: ~650 lines. Prompt: tasks/WP06-trail-enrichment.md.

WP07 — SaaS Read-Model Policy

Goal: Implement the typed projection_policy.py module + POLICY_TABLE + resolve_projection(); wire _propagate_one through the policy lookup; ensure the local-first invariant is preserved and existing dashboard behaviour for task_execution / mission_step events is unchanged.

Priority: Tranche B.

Independent test: Unit tests cover all 16 (mode, event) rows; integration tests exercise _propagate_one under each pair with a mocked connected client; golden-path tests assert task_execution/started and mission_step/completed behaviour is unchanged from 3.2.0a5.

Included subtasks:

  • ✅ T033 Create projection_policy.py with EventKind, ProjectionRule, POLICY_TABLE, resolve_projection()
  • ✅ T034 Modify _propagate_one to consult resolve_projection; gate envelope field inclusion
  • ✅ T035 Unit tests for all 16 policy rows + null-mode fallback
  • ✅ T036 Integration tests: propagator under mocked client; each (mode, event) pair
  • ✅ T037 NFR-007 / SC-008 assertion: propagation-errors.jsonl empty under sync-disabled
  • ✅ T038 Golden-path regression: existing task_execution / mission_step timeline behaviour preserved

Dependencies: WP06. Execution mode: code_change. Owned files: src/specify_cli/invocation/projection_policy.py, src/specify_cli/invocation/propagator.py, tests/specify_cli/invocation/test_projection_policy.py, tests/specify_cli/invocation/test_propagator_policy.py. Estimated prompt size: ~500 lines. Prompt: tasks/WP07-saas-read-model-policy.md.

WP08 — Post-Tranche-B Operator Docs + CHANGELOG

Goal: Add four new subsections to docs/trail-model.md (Mode of Work, Correlation Links, SaaS Read-Model Policy table, Tier 2 SaaS Projection Deferral) and the Tranche A + Tranche B unreleased CHANGELOG entry with migration notes.

Priority: Tranche B.

Independent test: Doc-presence test asserts every new subsection heading is present in docs/trail-model.md; CHANGELOG unreleased section includes both tranches + migration note.

Included subtasks:

  • ✅ T039 Add "Mode of Work (runtime-enforced)" subsection to docs/trail-model.md
  • ✅ T040 Add "Correlation Links" subsection to docs/trail-model.md
  • ✅ T041 Add "SaaS Read-Model Policy" subsection + full table to docs/trail-model.md
  • ✅ T042 Add "Tier 2 SaaS Projection — Deferred" subsection to docs/trail-model.md
  • ✅ T043 Update CHANGELOG.md unreleased section with Tranche A + Tranche B summaries + migration note
  • ✅ T044 Add doc-presence test tests/specify_cli/docs/test_trail_model_doc.py

Dependencies: WP06, WP07. Execution mode: code_change. Owned files: docs/trail-model.md, CHANGELOG.md, tests/specify_cli/docs/test_trail_model_doc.py. Estimated prompt size: ~400 lines. Prompt: tasks/WP08-post-tranche-b-docs.md.

WP09 — Tracker Hygiene

Goal: Close and update GitHub issues to reflect delivered scope. Close #496 on Tranche A merge, close #701 on mission merge, update #466 (Phase 4 tracker), cross-link #534 to its Phase 5 unblocker, leave #461 open.

Priority: Tranche B — final WP before mission close.

Independent test: gh issue view confirms the expected state for each of #496, #701, #466, #534, #461 after the closing agent runs the hygiene checklist.

Included subtasks:

  • ✅ T045 Prepare #496 close comment + close on Tranche A delivery
  • ✅ T046 Prepare #701 close comment + close on mission merge
  • ✅ T047 Update #466 (Phase 4 tracker) — Phase 4 follow-on shipped
  • ✅ T048 Cross-link #534 to #499 / #759 as Phase 5 glossary-foundation unblocker
  • ✅ T049 Verify #461 umbrella roadmap left open
  • ✅ T050 Retitle #496 to reflect delivered scope if needed
  • ✅ T051 Document completed hygiene actions in the PR description

Dependencies: WP08. Execution mode: planning_artifact (GitHub tracker work only — no code changes). Owned files: kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/artifacts/tracker-hygiene.md (checklist artifact recording what was done). Estimated prompt size: ~300 lines. Prompt: tasks/WP09-tracker-hygiene.md.


Parallelization Highlights

Tranche A:
  WP01 (inventory)
    ├── WP02 dashboard wording   ─┐
    ├── WP03 README + skills     ─┤─→ WP05 promotion
    └── WP04 skill-pack rollout  ─┘
        (WP02, WP03, WP04 may all run in parallel after WP01 merges)

Tranche B:
  WP05 (Tranche A closeout)
    └── WP06 trail enrichment
        └── WP07 SaaS policy
            └── WP08 docs + CHANGELOG
                └── WP09 tracker hygiene

Three Tranche A WPs (WP02, WP03, WP04) can run simultaneously once WP01 lands, giving meaningful parallelism. Tranche B is strictly sequential (four WPs) because each builds on the previous: policy needs mode + events; docs describe both; hygiene closes the package.

Dependencies Summary

WPDepends on
WP01(none)
WP02WP01
WP03WP01
WP04WP01
WP05WP02, WP03, WP04
WP06WP05
WP07WP06
WP08WP06, WP07
WP09WP08

MVP Scope Recommendation

The mission is a closeout, not an MVP build. However, if scope must be reduced, Tranche A alone (WP01 → WP05) is shippable on its own as the #496 closure; Tranche B can follow in a subsequent release. Do not cut mid-tranche.

Requirement Coverage

  • FR-001: WP01, WP05
  • FR-002: WP04
  • FR-003: WP02
  • FR-004: WP02
  • FR-005: WP03
  • FR-006: WP04
  • FR-007: WP06
  • FR-008: WP06
  • FR-009: WP06
  • FR-010: WP07
  • FR-011: WP08
  • FR-012: WP06, WP07
  • FR-013: WP08
  • FR-014: WP09