Spec Kitty

└─ kitty-specs
   └─ Phase 4 Closeout: Host-Surface Breadth and Trail Follow-On

Mission Run:

📚 Docs ↗

Contracts

host-surface-inventory.md

Contract: Host-Surface Inventory Matrix

Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Covers: FR-001 (inventory), FR-002 (parity), FR-006 (guidance-gap coverage), NFR-003 (100% surface coverage) Living location: kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/artifacts/host-surface-inventory.md Promoted location: docs/host-surface-parity.md (created by WP05 on mission close)

Purpose

One authoritative matrix, per mission, that lists every supported host surface and its parity status against the advise/ask/do governance-injection contract. Drives WP02–WP04 scope during Tranche A; becomes the durable operator-facing reference at closeout.

Row schema

Columns in this exact order:

#	Column	Type	Allowed values	Notes
1	`surface_key`	str	One of the keys in `AGENT_DIRS` from `src/specify_cli/upgrade/migrations/m_0_9_1_complete_lane_migration.py`	Canonical host key.
2	`directory`	str	e.g. `.claude/commands/`, `.agents/skills/spec-kitty.advise/`	Relative to repo root.
3	`kind`	str	`slash_command` or `agent_skill`	Derived from the surface category.
4	`has_advise_guidance`	bool	`yes` / `no`	Does the surface teach when to call `advise`/`ask`/`do`?
5	`has_governance_injection`	bool	`yes` / `no`	Does the surface teach how to inject `governance_context_text`?
6	`has_completion_guidance`	bool	`yes` / `no`	Does the surface teach how to call `profile-invocation complete`?
7	`guidance_style`	str	`inline` or `pointer`	`inline` hosts the content; `pointer` links to the canonical skill pack.
8	`parity_status`	str	`at_parity`, `partial`, or `missing`	Composite judgement from columns 4–7.
9	`notes`	str	free text	Captures per-surface rationale — especially required for `pointer` style per FR-006.

Canonical host surface list

The 15 supported surfaces are:

Slash-command surfaces (13)

surface_key	directory	subdir
`claude`	`.claude/`	`commands/`
`copilot`	`.github/`	`prompts/`
`gemini`	`.gemini/`	`commands/`
`cursor`	`.cursor/`	`commands/`
`qwen`	`.qwen/`	`commands/`
`opencode`	`.opencode/`	`command/`
`windsurf`	`.windsurf/`	`workflows/`
`kilocode`	`.kilocode/`	`workflows/`
`auggie`	`.augment/`	`commands/`
`roo`	`.roo/`	`commands/`
`q`	`.amazonq/`	`prompts/`
`kiro`	`.kiro/`	`prompts/`
`agent`	`.agent/`	`workflows/`

Agent Skills surfaces (2)

surface_key	directory
`codex`	`.agents/skills/` (reads from tree directly)
`vibe`	`.agents/skills/` (via `.vibe/config.toml::skill_paths`)

Parity judgement rubric

parity_status	Condition
`at_parity`	All three guidance flags `yes` and the surface matches the content shape shipped in `.agents/skills/spec-kitty.advise/SKILL.md` and `src/doctrine/skills/spec-kitty-runtime-next/SKILL.md`.
`partial`	Some guidance flags `yes`, some `no`, or guidance is present but not aligned with the reference content shape.
`missing`	All three guidance flags `no`.

Example row (worked)

| claude | .claude/commands/ | slash_command | yes | yes | yes | inline | at_parity | Priority slice shipped in 3.2.0a5. Uses src/doctrine/skills/spec-kitty-runtime-next/SKILL.md content. |

| copilot | .github/prompts/ | slash_command | no | no | no | pointer | missing | No governance-injection block present. WP04 will add a pointer to the canonical skill pack; .github/prompts/ is read into Copilot context via workspace-level prompts only. |

Promotion rules (WP05)

When Tranche A closes:

1. Copy the matrix verbatim to docs/host-surface-parity.md. 2. Add a short preamble to the promoted doc explaining what the matrix is and how it is kept up to date (any new host integration MUST add a row). 3. Link the promoted doc from:

docs/trail-model.md (under "Host surfaces that teach the trail" subsection).
README governance section (one-line pointer).

Acceptance

Mechanical: tests/specify_cli/docs/test_host_surface_inventory.py parses docs/host-surface-parity.md after WP05, asserts every surface_key from AGENT_DIRS has exactly one row, and asserts each row's parity_status is one of the three allowed values. Covers FR-001 / NFR-003.
Textual: Each row with parity_status != "at_parity" must have a non-empty notes column explaining the gap and the remediation plan. Covers FR-006.

profile-invocation-complete.md

Contract: `spec-kitty profile-invocation complete` (extended)

Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Covers: FR-007 (correlation), FR-009 (mode enforcement at promotion), FR-012 (local-first invariant) Target file: src/specify_cli/cli/commands/profile_invocation.py (CLI) + src/specify_cli/invocation/executor.py::complete_invocation (runtime)

Purpose

Close an open profile invocation record and, optionally:

1. Promote its output to a Tier 2 evidence artifact (existing behaviour; now mode-gated). 2. Attach one or more correlation links to the invocation JSONL — artifact references and/or a single commit SHA (new).

Command Shape

spec-kitty profile-invocation complete \
    --invocation-id <id> \
    [--outcome done|failed|abandoned] \
    [--evidence <path>] \
    [--artifact <path>]... \
    [--commit <sha>] \
    [--json]

Flag semantics

Flag	Cardinality	Type	Required	Default	Description
`--invocation-id`	1	str (ULID)	yes	—	Target invocation file.
`--outcome`	1	enum `done` / `failed` / `abandoned`	no	`None`	Recorded on the `completed` event.
`--evidence`	1	path-or-string	no	`None`	Tier 2 promotion trigger. Mode-gated: rejected for `advisory` / `query`.
`--artifact`	≥ 0	path-or-string	no	—	Repeatable. Each value appends one `artifact_link` event.
`--commit`	0 or 1	str (SHA)	no	`None`	Singular. Appends one `commit_link` event.
`--json`	flag	bool	no	false	Existing behaviour — JSON output.

Execution order

On a successful invocation with all flags present, the runtime performs these steps in order:

1. Read started event (first line of the invocation JSONL). 2. Mode enforcement (FR-009): if --evidence is set and the derived mode_of_work ∈ {advisory, query}, raise InvalidModeForEvidenceError and exit without appending any new lines. 3. Append completed event (existing write_completed). 4. If --evidence is set (and mode check passed): resolve + normalise ref, then call existing promote_to_evidence(). 5. For each --artifact <path>: resolve + normalise, append artifact_link event. 6. If --commit <sha>: append commit_link event. 7. Submit completed to SaaS propagator (existing behaviour). Correlation events are also submitted to the propagator, but projection is subject to POLICY_TABLE (see projection-policy.md).

Why steps 5 and 6 run after step 3

The append-only invariant holds in both orderings, but closing the invocation first lets readers distinguish completed from the correlation tail. Running correlation writes after the completed event also means a filesystem failure on a correlation write leaves the invocation in a fully-closed state — a retry of complete with the remaining correlation flags is a clean append, not a recovery.

Error shapes

Condition	Error class	Exit code	Message guidance
`--invocation-id` points to missing file	`InvocationError`	2	"Invocation record not found: <id>."
Already-completed invocation	`AlreadyClosedError`	2	Existing.
`--evidence` supplied on `advisory` or `query`	`InvalidModeForEvidenceError` (new)	2	"Cannot promote evidence on invocation <id>: mode is <mode>; Tier 2 evidence is only allowed on task_execution or mission_step invocations."
Filesystem write fails for any event	`InvocationWriteError`	2	Existing.

Ref normalisation (for `--artifact` and `--evidence`)

Per data-model.md §6:

Resolve the input path. If resolution succeeds and the resolved path is under repo_root, persist the repo-relative string.
Otherwise persist the absolute resolved path.
If resolution raises (malformed input), persist the input verbatim — same fallback the existing executor.complete_invocation already uses for unreadable evidence refs.

This normalisation rule applies uniformly to both --artifact and --evidence, so correlation refs and evidence refs read the same.

JSON output (when `--json` is set)

Response shape extended to report appended correlation events:

{
  "result": "success",
  "invocation_id": "01HXYZ...",
  "outcome": "done",
  "evidence_ref": "kitty-specs/042-foo/evidence/snapshot.md",
  "artifact_links": ["kitty-specs/042-foo/tasks/WP03.md", "build/report.html"],
  "commit_link": "a1b2c3d4e5f67890..."
}

evidence_ref is present only when --evidence was supplied and promotion succeeded.
artifact_links is an array (possibly empty). Order matches the input order of --artifact flags.
commit_link is a string or null.
Existing fields (result, invocation_id, outcome) are unchanged.

On error, the existing JSON error envelope shape is used.

Invariants

No existing JSONL line is mutated. All new events are append-only (C-004).
Tier 1 unconditional. The completed event is written before any correlation append; a filesystem failure on correlation does not leave the invocation half-closed. Tier 1 must keep working with SaaS disabled, unauthenticated, or network-unreachable (C-002, FR-012).
No new top-level command. This is a flag extension on an existing subcommand (C-008).
Backwards-compatible. Omitting --artifact and --commit yields identical behaviour to 3.2.0a5. Pre-mission invocations (no mode_of_work field on started) accept --evidence — None mode skips enforcement.

Acceptance tests (selected)

These tests live in tests/specify_cli/invocation/test_correlation.py (new) and tests/specify_cli/invocation/test_invocation_e2e.py (extended):

1. complete with two --artifact values appends two artifact_link events in order. 2. complete with --commit abc123 appends exactly one commit_link event. 3. complete with --artifact kitty-specs/042/spec.md (under checkout) persists "kitty-specs/042/spec.md". 4. complete with --artifact /tmp/report.log (outside checkout) persists "/tmp/report.log". 5. complete with --artifact ./build/out.log persists "build/out.log" (repo-relative). 6. complete on an advisory invocation with --evidence path raises InvalidModeForEvidenceError; no completed event, no evidence artifact, no correlation events written. 7. complete on a task_execution invocation with --evidence path --artifact other --commit sha writes (in order) completed → artifact_link → commit_link, and promotes evidence to .kittify/evidence/<id>/. 8. complete with sync disabled writes all events locally; .kittify/events/propagation-errors.jsonl remains empty. 9. Second call to complete for the same invocation_id raises AlreadyClosedError before any mutation (existing behaviour preserved).

projection-policy.md

Contract: `src/specify_cli/invocation/projection_policy.py`

Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Covers: FR-010 (SaaS read-model policy), FR-012 (local-first invariant), NFR-005 (typed + mypy-strict), NFR-007 (propagation-error quiet invariant)

Purpose

Define the single source of truth for how each (mode_of_work, event) pair projects to the SaaS timeline. Consumed by src/specify_cli/invocation/propagator.py::_propagate_one after the existing sync-gate. Documented for operators in docs/trail-model.md.

Public API

# Re-exported from projection_policy.py for caller convenience.
from specify_cli.invocation.modes import ModeOfWork

class EventKind(str, Enum):
    STARTED = "started"
    COMPLETED = "completed"
    ARTIFACT_LINK = "artifact_link"
    COMMIT_LINK = "commit_link"

@dataclass(frozen=True)
class ProjectionRule:
    project: bool
    include_request_text: bool
    include_evidence_ref: bool

POLICY_TABLE: dict[tuple[ModeOfWork, EventKind], ProjectionRule]

def resolve_projection(mode: ModeOfWork | None, event: EventKind) -> ProjectionRule: ...

The module exports exactly these symbols. Nothing else is public.

Table authority

POLICY_TABLE is the complete enumeration of 4 modes × 4 events = 16 entries. See data-model.md §5 for the full table.

Golden-path invariants (contract tests):

Row	Rule
`(TASK_EXECUTION, STARTED)`	`ProjectionRule(True, True, False)`
`(TASK_EXECUTION, COMPLETED)`	`ProjectionRule(True, True, True)`
`(MISSION_STEP, STARTED)`	`ProjectionRule(True, True, False)`
`(MISSION_STEP, COMPLETED)`	`ProjectionRule(True, True, True)`

Any change to these four rows requires an ADR and a migration note — they govern existing dashboard behaviour for active missions.

Expected zero-projection rows:

Row	Rule
any `(QUERY, *)`	`project=False`
`(ADVISORY, ARTIFACT_LINK)`	`project=False`
`(ADVISORY, COMMIT_LINK)`	`project=False`

Query invocations and advisory correlation events produce no SaaS timeline traffic.

`resolve_projection()` semantics

def resolve_projection(mode: ModeOfWork | None, event: EventKind) -> ProjectionRule:
    effective_mode = mode if mode is not None else ModeOfWork.TASK_EXECUTION
    return POLICY_TABLE.get((effective_mode, event), _DEFAULT_RULE)

mode is None → treated as TASK_EXECUTION. Rationale: pre-mission records projected under the old unconditional behaviour, which was effectively (TASK_EXECUTION, event) projection. Preserving that behaviour on upgrade means no dashboard regression.
Unknown (mode, event) pair → falls back to _DEFAULT_RULE = ProjectionRule(True, True, True). In practice the table is exhaustive for the enums as defined, so this path is only hit if a future EventKind value is added without the policy table being extended.

Consumer contract — `_propagate_one`

Modified sequence (diff from 3.2.0a5):

def _propagate_one(record: InvocationRecord_or_EventDict, repo_root: Path) -> None:
    # 1. Existing sync-gate — unchanged. Short-circuit on sync disabled.
    routing = resolve_checkout_sync_routing(repo_root)
    if routing is not None and not routing.effective_sync_enabled:
        return

    # 2. Existing auth/client lookup — unchanged.
    client = _get_saas_client(repo_root)
    if client is None:
        return

    # 3. NEW: consult policy.
    mode = _extract_mode(record)     # returns ModeOfWork | None
    event = _extract_event(record)   # returns EventKind
    rule = resolve_projection(mode, event)
    if not rule.project:
        return

    # 4. Existing envelope build — now respects include_request_text / include_evidence_ref.
    ...

The helper _extract_mode reads record.mode_of_work for InvocationRecord inputs and the mode_of_work key from the stored started event when the input is a correlation event dict. _extract_event maps the event field to EventKind.

Envelope field gating

When rule.include_request_text is False, the envelope for started events omits the request_text key entirely. Omission, not empty string — this keeps dashboard consumers able to distinguish "advisory started" (no body) from "task_execution started with empty request" (present body, empty).

Same rule for rule.include_evidence_ref: omit on False, include on True (only relevant for completed events where evidence_ref is present).

Invariants

Policy evaluation is read-only. It never writes to disk, never raises an uncaught exception, and never blocks.
Policy evaluation runs after the sync-gate. If sync is disabled for the checkout (effective_sync_enabled=False), resolve_projection is never called — the short-circuit still owns the gate (C-002, FR-012).
Policy evaluation runs after authentication. Unauthenticated checkouts never reach policy evaluation.
Type exhaustiveness. mypy --strict passes; ModeOfWork and EventKind are closed sets.
Frozen dataclass. ProjectionRule is frozen so policy rows are shareable and immutable.
No operator-configurable override. This mission does not introduce YAML or env-var overrides (C-009, D4).

Acceptance tests (selected)

These tests live in tests/specify_cli/invocation/test_projection_policy.py (new) and extensions to test_invocation_e2e.py:

1. Every (ModeOfWork, EventKind) pair has an entry in POLICY_TABLE. 2. Golden-path rows match the expected rules (see table above). 3. resolve_projection(None, EventKind.STARTED) returns the TASK_EXECUTION / STARTED rule (null-tolerance). 4. _propagate_one with a mocked connected WebSocket client:

5. With sync disabled (effective_sync_enabled=False), resolve_projection is not called and no envelope is built — verified by mock assertion. 6. With user unauthenticated, resolve_projection is not called — verified by mock assertion. 7. propagation-errors.jsonl remains empty across 100 invocations under all four modes with sync disabled (NFR-007, SC-008).

Drops advisory artifact_link events (no send_event call).
Emits task_execution started events (one send_event call, envelope includes request_text).
Emits task_execution completed events (envelope includes evidence_ref when supplied).

Spec Kitty

Contracts

host-surface-inventory.md

Contract: Host-Surface Inventory Matrix

Purpose

Row schema

Canonical host surface list

Slash-command surfaces (13)

Agent Skills surfaces (2)

Parity judgement rubric

Example row (worked)

Promotion rules (WP05)

Acceptance

profile-invocation-complete.md

Contract: spec-kitty profile-invocation complete (extended)

Purpose

Command Shape

Flag semantics

Execution order

Why steps 5 and 6 run after step 3

Error shapes

Ref normalisation (for --artifact and --evidence)

JSON output (when --json is set)

Invariants

Acceptance tests (selected)

projection-policy.md

Contract: src/specify_cli/invocation/projection_policy.py

Purpose

Public API

Table authority

resolve_projection() semantics

Consumer contract — _propagate_one

Envelope field gating

Invariants

Acceptance tests (selected)

Contract: `spec-kitty profile-invocation complete` (extended)

Ref normalisation (for `--artifact` and `--evidence`)

JSON output (when `--json` is set)

Contract: `src/specify_cli/invocation/projection_policy.py`

`resolve_projection()` semantics

Consumer contract — `_propagate_one`