Contracts

host-surface-inventory.md

Contract: Host-Surface Inventory Matrix

Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Covers: FR-001 (inventory), FR-002 (parity), FR-006 (guidance-gap coverage), NFR-003 (100% surface coverage) Living location: kitty-specs/phase-4-closeout-host-surfaces-and-trail-01KPWA5X/artifacts/host-surface-inventory.md Promoted location: docs/host-surface-parity.md (created by WP05 on mission close)

Purpose

One authoritative matrix, per mission, that lists every supported host surface and its parity status against the advise/ask/do governance-injection contract. Drives WP02–WP04 scope during Tranche A; becomes the durable operator-facing reference at closeout.

Row schema

Columns in this exact order:

#ColumnTypeAllowed valuesNotes
1surface_keystrOne of the keys in AGENT_DIRS from src/specify_cli/upgrade/migrations/m_0_9_1_complete_lane_migration.pyCanonical host key.
2directorystre.g. .claude/commands/, .agents/skills/spec-kitty.advise/Relative to repo root.
3kindstrslash_command or agent_skillDerived from the surface category.
4has_advise_guidanceboolyes / noDoes the surface teach when to call advise/ask/do?
5has_governance_injectionboolyes / noDoes the surface teach how to inject governance_context_text?
6has_completion_guidanceboolyes / noDoes the surface teach how to call profile-invocation complete?
7guidance_stylestrinline or pointerinline hosts the content; pointer links to the canonical skill pack.
8parity_statusstrat_parity, partial, or missingComposite judgement from columns 4–7.
9notesstrfree textCaptures per-surface rationale — especially required for pointer style per FR-006.

Canonical host surface list

The 15 supported surfaces are:

Slash-command surfaces (13)

surface_keydirectorysubdir
claude.claude/commands/
copilot.github/prompts/
gemini.gemini/commands/
cursor.cursor/commands/
qwen.qwen/commands/
opencode.opencode/command/
windsurf.windsurf/workflows/
kilocode.kilocode/workflows/
auggie.augment/commands/
roo.roo/commands/
q.amazonq/prompts/
kiro.kiro/prompts/
agent.agent/workflows/

Agent Skills surfaces (2)

surface_keydirectory
codex.agents/skills/ (reads from tree directly)
vibe.agents/skills/ (via .vibe/config.toml::skill_paths)

Parity judgement rubric

parity_statusCondition
at_parityAll three guidance flags yes and the surface matches the content shape shipped in .agents/skills/spec-kitty.advise/SKILL.md and src/doctrine/skills/spec-kitty-runtime-next/SKILL.md.
partialSome guidance flags yes, some no, or guidance is present but not aligned with the reference content shape.
missingAll three guidance flags no.

Example row (worked)

| claude | .claude/commands/ | slash_command | yes | yes | yes | inline | at_parity | Priority slice shipped in 3.2.0a5. Uses src/doctrine/skills/spec-kitty-runtime-next/SKILL.md content. |
| copilot | .github/prompts/ | slash_command | no | no | no | pointer | missing | No governance-injection block present. WP04 will add a pointer to the canonical skill pack; .github/prompts/ is read into Copilot context via workspace-level prompts only. |

Promotion rules (WP05)

When Tranche A closes:

1. Copy the matrix verbatim to docs/host-surface-parity.md. 2. Add a short preamble to the promoted doc explaining what the matrix is and how it is kept up to date (any new host integration MUST add a row). 3. Link the promoted doc from:

  • docs/trail-model.md (under "Host surfaces that teach the trail" subsection).
  • README governance section (one-line pointer).

Acceptance

  • Mechanical: tests/specify_cli/docs/test_host_surface_inventory.py parses docs/host-surface-parity.md after WP05, asserts every surface_key from AGENT_DIRS has exactly one row, and asserts each row's parity_status is one of the three allowed values. Covers FR-001 / NFR-003.
  • Textual: Each row with parity_status != "at_parity" must have a non-empty notes column explaining the gap and the remediation plan. Covers FR-006.

profile-invocation-complete.md

Contract: spec-kitty profile-invocation complete (extended)

Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Covers: FR-007 (correlation), FR-009 (mode enforcement at promotion), FR-012 (local-first invariant) Target file: src/specify_cli/cli/commands/profile_invocation.py (CLI) + src/specify_cli/invocation/executor.py::complete_invocation (runtime)

Purpose

Close an open profile invocation record and, optionally:

1. Promote its output to a Tier 2 evidence artifact (existing behaviour; now mode-gated). 2. Attach one or more correlation links to the invocation JSONL — artifact references and/or a single commit SHA (new).

Command Shape

spec-kitty profile-invocation complete \
    --invocation-id <id> \
    [--outcome done|failed|abandoned] \
    [--evidence <path>] \
    [--artifact <path>]... \
    [--commit <sha>] \
    [--json]

Flag semantics

FlagCardinalityTypeRequiredDefaultDescription
--invocation-id1str (ULID)yesTarget invocation file.
--outcome1enum done / failed / abandonednoNoneRecorded on the completed event.
--evidence1path-or-stringnoNoneTier 2 promotion trigger. Mode-gated: rejected for advisory / query.
--artifact≥ 0path-or-stringnoRepeatable. Each value appends one artifact_link event.
--commit0 or 1str (SHA)noNoneSingular. Appends one commit_link event.
--jsonflagboolnofalseExisting behaviour — JSON output.

Execution order

On a successful invocation with all flags present, the runtime performs these steps in order:

1. Read started event (first line of the invocation JSONL). 2. Mode enforcement (FR-009): if --evidence is set and the derived mode_of_work{advisory, query}, raise InvalidModeForEvidenceError and exit without appending any new lines. 3. Append completed event (existing write_completed). 4. If --evidence is set (and mode check passed): resolve + normalise ref, then call existing promote_to_evidence(). 5. For each --artifact <path>: resolve + normalise, append artifact_link event. 6. If --commit <sha>: append commit_link event. 7. Submit completed to SaaS propagator (existing behaviour). Correlation events are also submitted to the propagator, but projection is subject to POLICY_TABLE (see projection-policy.md).

Why steps 5 and 6 run after step 3

The append-only invariant holds in both orderings, but closing the invocation first lets readers distinguish completed from the correlation tail. Running correlation writes after the completed event also means a filesystem failure on a correlation write leaves the invocation in a fully-closed state — a retry of complete with the remaining correlation flags is a clean append, not a recovery.

Error shapes

ConditionError classExit codeMessage guidance
--invocation-id points to missing fileInvocationError2"Invocation record not found: <id>."
Already-completed invocationAlreadyClosedError2Existing.
--evidence supplied on advisory or queryInvalidModeForEvidenceError (new)2"Cannot promote evidence on invocation <id>: mode is <mode>; Tier 2 evidence is only allowed on task_execution or mission_step invocations."
Filesystem write fails for any eventInvocationWriteError2Existing.

Ref normalisation (for --artifact and --evidence)

Per data-model.md §6:

  • Resolve the input path. If resolution succeeds and the resolved path is under repo_root, persist the repo-relative string.
  • Otherwise persist the absolute resolved path.
  • If resolution raises (malformed input), persist the input verbatim — same fallback the existing executor.complete_invocation already uses for unreadable evidence refs.

This normalisation rule applies uniformly to both --artifact and --evidence, so correlation refs and evidence refs read the same.

JSON output (when --json is set)

Response shape extended to report appended correlation events:

{
  "result": "success",
  "invocation_id": "01HXYZ...",
  "outcome": "done",
  "evidence_ref": "kitty-specs/042-foo/evidence/snapshot.md",
  "artifact_links": ["kitty-specs/042-foo/tasks/WP03.md", "build/report.html"],
  "commit_link": "a1b2c3d4e5f67890..."
}
  • evidence_ref is present only when --evidence was supplied and promotion succeeded.
  • artifact_links is an array (possibly empty). Order matches the input order of --artifact flags.
  • commit_link is a string or null.
  • Existing fields (result, invocation_id, outcome) are unchanged.

On error, the existing JSON error envelope shape is used.

Invariants

  • No existing JSONL line is mutated. All new events are append-only (C-004).
  • Tier 1 unconditional. The completed event is written before any correlation append; a filesystem failure on correlation does not leave the invocation half-closed. Tier 1 must keep working with SaaS disabled, unauthenticated, or network-unreachable (C-002, FR-012).
  • No new top-level command. This is a flag extension on an existing subcommand (C-008).
  • Backwards-compatible. Omitting --artifact and --commit yields identical behaviour to 3.2.0a5. Pre-mission invocations (no mode_of_work field on started) accept --evidenceNone mode skips enforcement.

Acceptance tests (selected)

These tests live in tests/specify_cli/invocation/test_correlation.py (new) and tests/specify_cli/invocation/test_invocation_e2e.py (extended):

1. complete with two --artifact values appends two artifact_link events in order. 2. complete with --commit abc123 appends exactly one commit_link event. 3. complete with --artifact kitty-specs/042/spec.md (under checkout) persists "kitty-specs/042/spec.md". 4. complete with --artifact /tmp/report.log (outside checkout) persists "/tmp/report.log". 5. complete with --artifact ./build/out.log persists "build/out.log" (repo-relative). 6. complete on an advisory invocation with --evidence path raises InvalidModeForEvidenceError; no completed event, no evidence artifact, no correlation events written. 7. complete on a task_execution invocation with --evidence path --artifact other --commit sha writes (in order) completedartifact_linkcommit_link, and promotes evidence to .kittify/evidence/<id>/. 8. complete with sync disabled writes all events locally; .kittify/events/propagation-errors.jsonl remains empty. 9. Second call to complete for the same invocation_id raises AlreadyClosedError before any mutation (existing behaviour preserved).

projection-policy.md

Contract: src/specify_cli/invocation/projection_policy.py

Mission: phase-4-closeout-host-surfaces-and-trail-01KPWA5X Covers: FR-010 (SaaS read-model policy), FR-012 (local-first invariant), NFR-005 (typed + mypy-strict), NFR-007 (propagation-error quiet invariant)

Purpose

Define the single source of truth for how each (mode_of_work, event) pair projects to the SaaS timeline. Consumed by src/specify_cli/invocation/propagator.py::_propagate_one after the existing sync-gate. Documented for operators in docs/trail-model.md.

Public API

# Re-exported from projection_policy.py for caller convenience.
from specify_cli.invocation.modes import ModeOfWork

class EventKind(str, Enum):
    STARTED = "started"
    COMPLETED = "completed"
    ARTIFACT_LINK = "artifact_link"
    COMMIT_LINK = "commit_link"

@dataclass(frozen=True)
class ProjectionRule:
    project: bool
    include_request_text: bool
    include_evidence_ref: bool

POLICY_TABLE: dict[tuple[ModeOfWork, EventKind], ProjectionRule]

def resolve_projection(mode: ModeOfWork | None, event: EventKind) -> ProjectionRule: ...

The module exports exactly these symbols. Nothing else is public.

Table authority

POLICY_TABLE is the complete enumeration of 4 modes × 4 events = 16 entries. See data-model.md §5 for the full table.

Golden-path invariants (contract tests):

RowRule
(TASK_EXECUTION, STARTED)ProjectionRule(True, True, False)
(TASK_EXECUTION, COMPLETED)ProjectionRule(True, True, True)
(MISSION_STEP, STARTED)ProjectionRule(True, True, False)
(MISSION_STEP, COMPLETED)ProjectionRule(True, True, True)

Any change to these four rows requires an ADR and a migration note — they govern existing dashboard behaviour for active missions.

Expected zero-projection rows:

RowRule
any (QUERY, *)project=False
(ADVISORY, ARTIFACT_LINK)project=False
(ADVISORY, COMMIT_LINK)project=False

Query invocations and advisory correlation events produce no SaaS timeline traffic.

resolve_projection() semantics

def resolve_projection(mode: ModeOfWork | None, event: EventKind) -> ProjectionRule:
    effective_mode = mode if mode is not None else ModeOfWork.TASK_EXECUTION
    return POLICY_TABLE.get((effective_mode, event), _DEFAULT_RULE)
  • mode is None → treated as TASK_EXECUTION. Rationale: pre-mission records projected under the old unconditional behaviour, which was effectively (TASK_EXECUTION, event) projection. Preserving that behaviour on upgrade means no dashboard regression.
  • Unknown (mode, event) pair → falls back to _DEFAULT_RULE = ProjectionRule(True, True, True). In practice the table is exhaustive for the enums as defined, so this path is only hit if a future EventKind value is added without the policy table being extended.

Consumer contract — _propagate_one

Modified sequence (diff from 3.2.0a5):

def _propagate_one(record: InvocationRecord_or_EventDict, repo_root: Path) -> None:
    # 1. Existing sync-gate — unchanged. Short-circuit on sync disabled.
    routing = resolve_checkout_sync_routing(repo_root)
    if routing is not None and not routing.effective_sync_enabled:
        return

    # 2. Existing auth/client lookup — unchanged.
    client = _get_saas_client(repo_root)
    if client is None:
        return

    # 3. NEW: consult policy.
    mode = _extract_mode(record)     # returns ModeOfWork | None
    event = _extract_event(record)   # returns EventKind
    rule = resolve_projection(mode, event)
    if not rule.project:
        return

    # 4. Existing envelope build — now respects include_request_text / include_evidence_ref.
    ...

The helper _extract_mode reads record.mode_of_work for InvocationRecord inputs and the mode_of_work key from the stored started event when the input is a correlation event dict. _extract_event maps the event field to EventKind.

Envelope field gating

When rule.include_request_text is False, the envelope for started events omits the request_text key entirely. Omission, not empty string — this keeps dashboard consumers able to distinguish "advisory started" (no body) from "task_execution started with empty request" (present body, empty).

Same rule for rule.include_evidence_ref: omit on False, include on True (only relevant for completed events where evidence_ref is present).

Invariants

  • Policy evaluation is read-only. It never writes to disk, never raises an uncaught exception, and never blocks.
  • Policy evaluation runs after the sync-gate. If sync is disabled for the checkout (effective_sync_enabled=False), resolve_projection is never called — the short-circuit still owns the gate (C-002, FR-012).
  • Policy evaluation runs after authentication. Unauthenticated checkouts never reach policy evaluation.
  • Type exhaustiveness. mypy --strict passes; ModeOfWork and EventKind are closed sets.
  • Frozen dataclass. ProjectionRule is frozen so policy rows are shareable and immutable.
  • No operator-configurable override. This mission does not introduce YAML or env-var overrides (C-009, D4).

Acceptance tests (selected)

These tests live in tests/specify_cli/invocation/test_projection_policy.py (new) and extensions to test_invocation_e2e.py:

1. Every (ModeOfWork, EventKind) pair has an entry in POLICY_TABLE. 2. Golden-path rows match the expected rules (see table above). 3. resolve_projection(None, EventKind.STARTED) returns the TASK_EXECUTION / STARTED rule (null-tolerance). 4. _propagate_one with a mocked connected WebSocket client:

5. With sync disabled (effective_sync_enabled=False), resolve_projection is not called and no envelope is built — verified by mock assertion. 6. With user unauthenticated, resolve_projection is not called — verified by mock assertion. 7. propagation-errors.jsonl remains empty across 100 invocations under all four modes with sync disabled (NFR-007, SC-008).

  • Drops advisory artifact_link events (no send_event call).
  • Emits task_execution started events (one send_event call, envelope includes request_text).
  • Emits task_execution completed events (envelope includes evidence_ref when supplied).