Spec Kitty

└─ kitty-specs
   └─ Phase 4 Closeout: Host-Surface Breadth and Trail Follow-On

Mission Run:

📚 Docs ↗

Quickstart: Phase 4 Closeout — Operator Walkthrough

Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Audience: operators and reviewers verifying Tranche A + Tranche B behaviour after merge.

This walkthrough exercises every user-visible behaviour this mission ships. Each section names the FR and SC that it validates.

Prerequisites

Clean checkout of the branch that merged this mission.
Python environment active (source .venv/bin/activate or equivalent).
No SaaS auth token set (we will toggle auth on in one optional step only).
spec-kitty installed from this checkout.

1. Host-surface parity (Tranche A) — FR-001, FR-002, SC-001

1.1 Read the promoted matrix

less docs/host-surface-parity.md

Expected:

One row per supported host surface (15 total: 13 slash-command + 2 Agent Skills).
Every row has a parity_status of at_parity, partial, or missing.
Every non-at_parity row has a notes column explaining the gap and remediation plan.

1.2 Spot-check one surface for inline parity content

Open one surface that the matrix marks at_parity with guidance_style=inline (e.g. .agents/skills/spec-kitty.advise/SKILL.md) and confirm it has all three sections:

"Governance context injection" (how to inject governance_context_text).
"Discover profiles / Get governance context" (advise/ask/do usage).
"Close the record" (how to call profile-invocation complete).

1.3 Spot-check one surface for pointer parity

Open one surface marked at_parity with guidance_style=pointer. Confirm it contains an explicit pointer to the canonical skill pack (e.g. "For the canonical advise/ask/do contract, see .agents/skills/spec-kitty.advise/SKILL.md.").

2. Dashboard wording (Tranche A) — FR-003, FR-004, SC-002

2.1 Start the dashboard

spec-kitty dashboard

2.2 Verify user-visible wording

Open the dashboard in a browser. Confirm:

Mission selector label reads Mission Run:, not Feature:.
Mission header reads Mission Run: <name>, not Feature: <name>.
"Mission Run Overview" heading appears where "Feature Overview" used to.
"Mission Run Analysis" heading appears where "Feature Analysis" used to.
Empty state reads "Create your first mission…" and "…run /spec-kitty.specify to create your first mission run".
Unknown-mission fallback label reads "Unknown mission", not "Unknown feature".
Diagnostics page: the active-mission diagnostic row reads "no mission context" when no mission is selected.

2.3 Verify backend identifiers are preserved (FR-004, C-007)

From DevTools → Elements panel:

The selector <select> still has id feature-select (unchanged).
The selector container still has id feature-selector-container.
CSS class .feature-selector still applies.

From DevTools → Application → Cookies:

lastFeature cookie is still set on selection (no cookie migration).

From DevTools → Network:

API routes retain /api/kanban/<feature> and /api/artifact/<feature>/<name> shape.
JSON responses retain feature_id, feature_number, current_feature field names.

3. Mode of work is derived at the CLI (Tranche B) — FR-008

3.1 Open an advisory invocation

spec-kitty advise "how should I split this refactor" --json

3.2 Inspect the trail

# Grab the invocation_id from the JSON output; call it $ID.
head -n 1 .kittify/events/profile-invocations/$ID.jsonl | python -m json.tool

Expected: mode_of_work field is "advisory".

3.3 Repeat for task execution

spec-kitty do "add a README badge" --json
# ... inspect: mode_of_work should be "task_execution"

3.4 Repeat for query

spec-kitty profiles list --json
# No invocation is opened — profiles list is a query with no InvocationRecord.
spec-kitty invocations list --limit 3 --json
# Same — query, no InvocationRecord opened.

Confirmed: query commands do not open invocation records.

4. Correlation links (Tranche B) — FR-007, SC-003

4.1 Close a task-execution invocation with two artifacts and one commit

# Start a task-execution invocation:
spec-kitty ask implementer "implement the login handler" --json
# -> note $ID from the response

# ... do the work, commit it ...
COMMIT_SHA=$(git rev-parse HEAD)

spec-kitty profile-invocation complete \
    --invocation-id $ID \
    --outcome done \
    --artifact src/login/handler.py \
    --artifact tests/login/test_handler.py \
    --commit $COMMIT_SHA \
    --json

Expected JSON response includes:

"evidence_ref": null (no --evidence passed)
"artifact_links": ["src/login/handler.py", "tests/login/test_handler.py"] (both repo-relative, order preserved)
"commit_link": "<sha>"

4.2 Inspect the trail file

cat .kittify/events/profile-invocations/$ID.jsonl

Expected four lines in order: 1. started event (with mode_of_work: "task_execution") 2. completed event 3. artifact_link with ref=src/login/handler.py 4. artifact_link with ref=tests/login/test_handler.py 5. commit_link with the recorded SHA

4.3 Verify ref normalisation for an out-of-checkout path

spec-kitty ask implementer "write a scratch log" --json
# -> note $ID2

echo "scratch" > /tmp/scratch.log

spec-kitty profile-invocation complete \
    --invocation-id $ID2 \
    --outcome done \
    --artifact /tmp/scratch.log \
    --json

grep '"ref"' .kittify/events/profile-invocations/$ID2.jsonl

Expected: the persisted ref is "/tmp/scratch.log" (absolute, because the path is outside the checkout).

5. Mode enforcement at Tier 2 promotion (Tranche B) — FR-009, SC-004

5.1 Attempt evidence promotion on an advisory invocation

spec-kitty advise "should I refactor this module" --json
# -> note $ADVISE_ID

echo "some notes" > /tmp/notes.md

spec-kitty profile-invocation complete \
    --invocation-id $ADVISE_ID \
    --outcome done \
    --evidence /tmp/notes.md

Expected:

Exit code 2.
Error message contains: "Cannot promote evidence on invocation … mode is advisory; Tier 2 evidence is only allowed on task_execution or mission_step invocations."
The invocation file has no completed event (the invocation is still open).
No .kittify/evidence/$ADVISE_ID/ directory was created.

5.2 Close the advisory invocation cleanly

spec-kitty profile-invocation complete \
    --invocation-id $ADVISE_ID \
    --outcome done

Expected: succeeds, completed event appended, no evidence artifact.

6. SaaS read-model policy (Tranche B) — FR-010, SC-005

6.1 Read the operator policy table

Open docs/trail-model.md and navigate to the "SaaS Read-Model Policy" subsection. Confirm the table lists exactly 16 rows (4 modes × 4 event kinds) and that each row specifies project, include_request_text, include_evidence_ref.

6.2 Predict projection from the table

Given the table alone, confirm you can predict — without reading code — the projection behaviour for:

(advisory, started) → projected, body omitted, no evidence.
(query, started) → no projection.
(mission_step, completed) → projected with body and evidence_ref.
(task_execution, artifact_link) → projected, body omitted, no evidence.

7. Tier 2 evidence stays local-only (Tranche B) — FR-011, SC-006

7.1 Read the deferral note

Open docs/trail-model.md and navigate to the "Tier 2 SaaS Projection — Deferred" subsection. Confirm:

Status is stated decisively: Tier 2 evidence remains local-only in 3.2.x.
The reasoning (D5) is present.
The revisit trigger is named.

7.2 Verify behaviour

Even with SaaS sync enabled and authenticated (if you want to optionally toggle this on), evidence artifacts in .kittify/evidence/<invocation_id>/ are not uploaded. This is the status quo from 3.2.0a5 and is confirmed by the deferral note.

8. Local-first invariant holds (Tranche B) — FR-012, NFR-007, SC-008

8.1 Ensure sync is disabled / unauthenticated

unset SPEC_KITTY_SAAS_TOKEN  # if set
# confirm routing returns effective_sync_enabled=False for this checkout

8.2 Run the full exercise above

Redo sections 3–5 above. Confirm every invocation file is written correctly.

8.3 Verify no propagation errors were logged

test -f .kittify/events/propagation-errors.jsonl && wc -l .kittify/events/propagation-errors.jsonl || echo "file absent"

Expected:

Either the file is absent, or it has zero lines.

8.4 Timing spot-check (NFR-001)

time spec-kitty advise "a trivial ask" --json

Expected wall time dominated by CLI startup, not by trail I/O. started event write budget is ≤ 5 ms (enforced by NFR-001 test).

9. Tracker hygiene at merge (FR-014, SC-007)

This is a manual checklist for the release owner when the mission merges to main. It does not run automatically.

□ Close #496 (Phase 4 host-surface breadth follow-on).
□ Close #701 (Phase 4 trail follow-on).
□ Update #466 (Phase 4 tracker) to reflect that Phase 4 follow-on has shipped.
□ Update #534 (spec-kitty explain) with a cross-link to #499 (DRG glossary addressability) and #759 (Phase 5 glossary foundation) as the unblocker.
□ Leave #461 (umbrella roadmap) open.
□ Verify the CHANGELOG unreleased section has the Tranche A + Tranche B entries.
□ Verify docs/trail-model.md links to docs/host-surface-parity.md and to the SaaS Read-Model Policy subsection.

10. Rollback guidance

If any Tranche B behaviour is found to be regressive after merge:

1. Correlation links: no rollback needed — links are additive events on existing files; remove the flags from the CLI to stop producing them. 2. Mode enforcement: can be temporarily bypassed by passing an invocation that has no mode_of_work (pre-mission records, or a programmatically-opened invocation that omits the field). Proper rollback is a hotfix reverting the InvalidModeForEvidenceError raise in complete_invocation. 3. SaaS policy: reverting the policy lookup in _propagate_one restores 3.2.0a5 behaviour exactly. 4. Dashboard wording: a wording revert is a three-file edit; backend identifiers are untouched so no data migration is needed.

Each rollback is a surgical revert; no schema change needs to be undone.