Quickstart: Phase 4 Closeout — Operator Walkthrough
Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Audience: operators and reviewers verifying Tranche A + Tranche B behaviour after merge.
This walkthrough exercises every user-visible behaviour this mission ships. Each section names the FR and SC that it validates.
Prerequisites
- Clean checkout of the branch that merged this mission.
- Python environment active (
source .venv/bin/activateor equivalent). - No SaaS auth token set (we will toggle auth on in one optional step only).
spec-kittyinstalled from this checkout.
1. Host-surface parity (Tranche A) — FR-001, FR-002, SC-001
1.1 Read the promoted matrix
less docs/host-surface-parity.md
Expected:
- One row per supported host surface (15 total: 13 slash-command + 2 Agent Skills).
- Every row has a
parity_statusofat_parity,partial, ormissing. - Every non-
at_parityrow has anotescolumn explaining the gap and remediation plan.
1.2 Spot-check one surface for inline parity content
Open one surface that the matrix marks at_parity with guidance_style=inline (e.g. .agents/skills/spec-kitty.advise/SKILL.md) and confirm it has all three sections:
- "Governance context injection" (how to inject
governance_context_text). - "Discover profiles / Get governance context" (advise/ask/do usage).
- "Close the record" (how to call
profile-invocation complete).
1.3 Spot-check one surface for pointer parity
Open one surface marked at_parity with guidance_style=pointer. Confirm it contains an explicit pointer to the canonical skill pack (e.g. "For the canonical advise/ask/do contract, see .agents/skills/spec-kitty.advise/SKILL.md.").
2. Dashboard wording (Tranche A) — FR-003, FR-004, SC-002
2.1 Start the dashboard
spec-kitty dashboard
2.2 Verify user-visible wording
Open the dashboard in a browser. Confirm:
- Mission selector label reads Mission Run:, not
Feature:. - Mission header reads Mission Run: <name>, not
Feature: <name>. - "Mission Run Overview" heading appears where "Feature Overview" used to.
- "Mission Run Analysis" heading appears where "Feature Analysis" used to.
- Empty state reads "Create your first mission…" and "…run
/spec-kitty.specifyto create your first mission run". - Unknown-mission fallback label reads "Unknown mission", not "Unknown feature".
- Diagnostics page: the active-mission diagnostic row reads "no mission context" when no mission is selected.
2.3 Verify backend identifiers are preserved (FR-004, C-007)
From DevTools → Elements panel:
- The selector
<select>still has idfeature-select(unchanged). - The selector container still has id
feature-selector-container. - CSS class
.feature-selectorstill applies.
From DevTools → Application → Cookies:
lastFeaturecookie is still set on selection (no cookie migration).
From DevTools → Network:
- API routes retain
/api/kanban/<feature>and/api/artifact/<feature>/<name>shape. - JSON responses retain
feature_id,feature_number,current_featurefield names.
3. Mode of work is derived at the CLI (Tranche B) — FR-008
3.1 Open an advisory invocation
spec-kitty advise "how should I split this refactor" --json
3.2 Inspect the trail
# Grab the invocation_id from the JSON output; call it $ID.
head -n 1 .kittify/events/profile-invocations/$ID.jsonl | python -m json.tool
Expected: mode_of_work field is "advisory".
3.3 Repeat for task execution
spec-kitty do "add a README badge" --json
# ... inspect: mode_of_work should be "task_execution"
3.4 Repeat for query
spec-kitty profiles list --json
# No invocation is opened — profiles list is a query with no InvocationRecord.
spec-kitty invocations list --limit 3 --json
# Same — query, no InvocationRecord opened.
Confirmed: query commands do not open invocation records.
4. Correlation links (Tranche B) — FR-007, SC-003
4.1 Close a task-execution invocation with two artifacts and one commit
# Start a task-execution invocation:
spec-kitty ask implementer "implement the login handler" --json
# -> note $ID from the response
# ... do the work, commit it ...
COMMIT_SHA=$(git rev-parse HEAD)
spec-kitty profile-invocation complete \
--invocation-id $ID \
--outcome done \
--artifact src/login/handler.py \
--artifact tests/login/test_handler.py \
--commit $COMMIT_SHA \
--json
Expected JSON response includes:
"evidence_ref": null(no--evidencepassed)"artifact_links": ["src/login/handler.py", "tests/login/test_handler.py"](both repo-relative, order preserved)"commit_link": "<sha>"
4.2 Inspect the trail file
cat .kittify/events/profile-invocations/$ID.jsonl
Expected four lines in order: 1. started event (with mode_of_work: "task_execution") 2. completed event 3. artifact_link with ref=src/login/handler.py 4. artifact_link with ref=tests/login/test_handler.py 5. commit_link with the recorded SHA
4.3 Verify ref normalisation for an out-of-checkout path
spec-kitty ask implementer "write a scratch log" --json
# -> note $ID2
echo "scratch" > /tmp/scratch.log
spec-kitty profile-invocation complete \
--invocation-id $ID2 \
--outcome done \
--artifact /tmp/scratch.log \
--json
grep '"ref"' .kittify/events/profile-invocations/$ID2.jsonl
Expected: the persisted ref is "/tmp/scratch.log" (absolute, because the path is outside the checkout).
5. Mode enforcement at Tier 2 promotion (Tranche B) — FR-009, SC-004
5.1 Attempt evidence promotion on an advisory invocation
spec-kitty advise "should I refactor this module" --json
# -> note $ADVISE_ID
echo "some notes" > /tmp/notes.md
spec-kitty profile-invocation complete \
--invocation-id $ADVISE_ID \
--outcome done \
--evidence /tmp/notes.md
Expected:
- Exit code 2.
- Error message contains: "Cannot promote evidence on invocation … mode is advisory; Tier 2 evidence is only allowed on task_execution or mission_step invocations."
- The invocation file has no
completedevent (the invocation is still open). - No
.kittify/evidence/$ADVISE_ID/directory was created.
5.2 Close the advisory invocation cleanly
spec-kitty profile-invocation complete \
--invocation-id $ADVISE_ID \
--outcome done
Expected: succeeds, completed event appended, no evidence artifact.
6. SaaS read-model policy (Tranche B) — FR-010, SC-005
6.1 Read the operator policy table
Open docs/trail-model.md and navigate to the "SaaS Read-Model Policy" subsection. Confirm the table lists exactly 16 rows (4 modes × 4 event kinds) and that each row specifies project, include_request_text, include_evidence_ref.
6.2 Predict projection from the table
Given the table alone, confirm you can predict — without reading code — the projection behaviour for:
(advisory, started)→ projected, body omitted, no evidence.(query, started)→ no projection.(mission_step, completed)→ projected with body and evidence_ref.(task_execution, artifact_link)→ projected, body omitted, no evidence.
7. Tier 2 evidence stays local-only (Tranche B) — FR-011, SC-006
7.1 Read the deferral note
Open docs/trail-model.md and navigate to the "Tier 2 SaaS Projection — Deferred" subsection. Confirm:
- Status is stated decisively: Tier 2 evidence remains local-only in 3.2.x.
- The reasoning (D5) is present.
- The revisit trigger is named.
7.2 Verify behaviour
Even with SaaS sync enabled and authenticated (if you want to optionally toggle this on), evidence artifacts in .kittify/evidence/<invocation_id>/ are not uploaded. This is the status quo from 3.2.0a5 and is confirmed by the deferral note.
8. Local-first invariant holds (Tranche B) — FR-012, NFR-007, SC-008
8.1 Ensure sync is disabled / unauthenticated
unset SPEC_KITTY_SAAS_TOKEN # if set
# confirm routing returns effective_sync_enabled=False for this checkout
8.2 Run the full exercise above
Redo sections 3–5 above. Confirm every invocation file is written correctly.
8.3 Verify no propagation errors were logged
test -f .kittify/events/propagation-errors.jsonl && wc -l .kittify/events/propagation-errors.jsonl || echo "file absent"
Expected:
- Either the file is absent, or it has zero lines.
8.4 Timing spot-check (NFR-001)
time spec-kitty advise "a trivial ask" --json
Expected wall time dominated by CLI startup, not by trail I/O. started event write budget is ≤ 5 ms (enforced by NFR-001 test).
9. Tracker hygiene at merge (FR-014, SC-007)
This is a manual checklist for the release owner when the mission merges to main. It does not run automatically.
- □ Close
#496(Phase 4 host-surface breadth follow-on). - □ Close
#701(Phase 4 trail follow-on). - □ Update
#466(Phase 4 tracker) to reflect that Phase 4 follow-on has shipped. - □ Update
#534(spec-kitty explain) with a cross-link to#499(DRG glossary addressability) and#759(Phase 5 glossary foundation) as the unblocker. - □ Leave
#461(umbrella roadmap) open. - □ Verify the CHANGELOG unreleased section has the Tranche A + Tranche B entries.
- □ Verify
docs/trail-model.mdlinks todocs/host-surface-parity.mdand to the SaaS Read-Model Policy subsection.
10. Rollback guidance
If any Tranche B behaviour is found to be regressive after merge:
1. Correlation links: no rollback needed — links are additive events on existing files; remove the flags from the CLI to stop producing them. 2. Mode enforcement: can be temporarily bypassed by passing an invocation that has no mode_of_work (pre-mission records, or a programmatically-opened invocation that omits the field). Proper rollback is a hotfix reverting the InvalidModeForEvidenceError raise in complete_invocation. 3. SaaS policy: reverting the policy lookup in _propagate_one restores 3.2.0a5 behaviour exactly. 4. Dashboard wording: a wording revert is a three-file edit; backend identifiers are untouched so no data migration is needed.
Each rollback is a surgical revert; no schema change needs to be undone.