Command-Contract Drift — Ground Truth, Inventory, and Guard Design
Author: Python Pedro (research op, bounded)
Scope: Issue #2007 systemic problem (1) — command-contract drift: skills/prompts/docs
reference CLI surfaces/flags that do not exist or are internal-only.
Method: Live Typer introspection of specify_cli.app (the registered command tree,
not a copy), cross-referenced against SOURCE templates (src/doctrine/missions/mission-steps/),
SOURCE skills (src/doctrine/skills/), and docs/. Generated agent copies under .claude/,
.amazonq/, etc. are out of scope — they propagate from SOURCE on spec-kitty upgrade.
1. Ground-truth CLI surface (the commands #2007 implicates)
Produced by walking typer.main.get_command(specify_cli.app) and listing each leaf
command's click Arguments and Options. This is authoritative as of pr/tool-surface-contract-residuals.
| #2007 claim | Surface as documented/expected | Actually registered? | Ground truth |
|---|---|---|---|
| 1 | spec-kitty doctrine list --kind <k> / doctrine show <id> |
NO | doctrine group only registers: fetch, new <kind> <id>, regenerate-graph, validate <path>, mission-type list, and subgroups org (init/validate), pack (assemble/validate). There is no list and no show. Canonical listing/inspection is the programmatic from doctrine.service import DoctrineService API (already documented later in the same skills) or reading the YAML under the doctrine packs. |
| 5 | agent context resolve — is --action required? |
yes (exists) | agent context resolve ARGS=[] OPTS=--action, --mission, --wp-id, --agent, --json. --action is required (the command fails with a missing---action error otherwise — confirmed by #2007 and by the prompts that already pass --action plan|tasks|implement|review). |
| 4 / 12 | agent mission setup-plan — is --mission required? |
flags optional | agent mission setup-plan ARGS=[] OPTS=--mission, --json. The snippet setup-plan --json is syntactically valid (both flags optional). The drift is behavioral: with exactly one mission present the resolver returns PLAN_CONTEXT_UNRESOLVED/FEATURE_CONTEXT_UNRESOLVED and says "pass --mission" instead of auto-selecting. Not a snippet typo — a resolver contract gap. |
| 16 | agent action implement / agent action review — --json? |
expected by callers | agent action implement ARGS=[wp_id] OPTS=--mission, --agent, --allow-sparse-checkout, --acknowledge-not-bulk-edit — no --json. agent action review ARGS=[wp_id] OPTS=--mission, --agent — no --json. Passing --json → Typer exit 2. |
| 16 | top-level spec-kitty implement vs agent action implement |
both exist | Top-level implement ARGS=[wp_id] OPTS=--mission, --auto-commit, --json, --recover, --base, --acknowledge-not-bulk-edit, --actor — has --json and is the internal allocator. The canonical agent surface (agent action implement) does not. This is the split #2007 names: JSON lives on the internal surface, not the canonical one. (--feature hard-removed in #1060.) |
| 13 | agent worktree repair |
NO | There is no agent worktree group at all. No worktree repair anywhere in the tree. |
| 13 | doctor workspaces --fix |
yes (exists) | doctor workspaces ARGS=[] OPTS=--fix, --json. This is the real recovery surface that hints should point to. (doctor coordination, doctor mission-state --fix, doctor sparse-checkout --fix also exist.) |
Supporting surfaces relevant to the inventory below:
charter contextARGS=[]OPTS=--action, --include, --mark-loaded, --json. The--mark-loadedbool auto-generates--no-mark-loaded, socharter context --action specify --no-mark-loaded --jsonis valid (false-positive trap — see guard design).agent mission finalize-tasks/agent tasks finalize-tasksboth registered, OPTS include--mission, --json, --validate-only.
2. Drifted-reference inventory (SOURCE only)
Classification: HARD = the snippet would fail (nonexistent command/flag); BEHAVIORAL = syntactically valid but contradicts the command's runtime contract; excluded = generated copies.
HARD drift — nonexistent command surfaces
| File:line | Wrong surface | Correct surface |
|---|---|---|
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:113 |
spec-kitty doctrine list --kind directive |
No CLI. Use DoctrineService (documented at SKILL.md:621) or read pack YAML. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:114 |
spec-kitty doctrine show <a-directive-id> |
No CLI. Same as above. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:116 |
spec-kitty doctrine list --kind tactic |
No CLI. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:117 |
spec-kitty doctrine show <a-tactic-id> |
No CLI. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:119 |
spec-kitty doctrine list --kind styleguide |
No CLI. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:120 |
spec-kitty doctrine show <a-styleguide-id> |
No CLI. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:348 |
spec-kitty doctrine list --kind directive |
No CLI. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:439 |
spec-kitty doctrine list --kind directive |
No CLI. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:440 |
spec-kitty doctrine list --kind tactic |
No CLI. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:441 |
spec-kitty doctrine list --kind paradigm |
No CLI. |
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md:444 |
spec-kitty doctrine show DIRECTIVE_034 |
No CLI. |
src/doctrine/skills/spec-kitty-mission-system/SKILL.md:320 |
spec-kitty doctrine list --kind procedure |
No CLI. |
12 HARD-drift references in SOURCE skills (#2007 item 1). Note the same skill file
(spec-kitty-charter-doctrine) correctly documents the DoctrineService programmatic API at
line 621 — the CLI snippets are vestigial and contradict the skill's own canonical guidance.
BEHAVIORAL drift — valid snippet, wrong contract
| File:line | Snippet | Issue / correct guidance |
|---|---|---|
src/doctrine/missions/mission-steps/software-dev/plan/prompt.md:267 |
spec-kitty agent context resolve --mission <handle> --json |
Missing required --action. The other four prompts (tasks/tasks-outline/tasks-packages/implement/review) all pass --action; this plan-prompt line does not. Add --action plan. (#2007 item 5.) |
src/doctrine/missions/mission-steps/software-dev/plan/prompt.md:80 |
spec-kitty agent mission setup-plan --json (no --mission) |
Advertises no-flag operation; resolver returns PLAN_CONTEXT_UNRESOLVED even with one mission. (#2007 item 4.) Behavioral — resolve via auto-select-when-exactly-one OR require --mission in the prompt. |
src/doctrine/missions/mission-steps/software-dev/plan/prompt.md:241 |
spec-kitty agent mission setup-plan --json once without --mission |
Same as above — the prompt explicitly instructs a no-flag first attempt that the resolver rejects. (#2007 item 4/12.) |
Worktree-repair references — already correctly absent from SOURCE prompts/skills
agent worktree repairappears only indocs/plans/engineering-notes/naming-identity-ssot-strangler/*as tracker-sweep notes (#1890), explicitly flagged as "nonexistent command / follow-up". These are archaeology notes, not prescriptive snippets, so they are not active drift. The SOURCE setup-doctor skill's worktree-recovery section (common-failure-signatures.md:150-180) already uses the correctgit worktree prune+spec-kitty implement WP01recovery — noworktree repairreference there. #2007 item 13's drift is in the installed 3.2.0 build / generated code-paths, not in current SOURCE prompts. The guard should still cover this to prevent reintroduction.
Confirmed false-positives (do NOT flag)
charter context ... --no-mark-loaded(charter/prompt.md:260; charter-doctrine SKILL.md:602; charter-command-map.md:230) — valid Typer auto-negation of--mark-loaded.from specify_cli.* import ...python snippets inspec-kitty-bulk-edit-classification/SKILL.md(lines 92, 99, 138, 154) — these are real, in-venv module paths (specify_cli.mission_metadata,specify_cli.bulk_edit.occurrence_map), not the bogusspecify_cli.core.templates. #2007 item 9's bad import (from specify_cli.core.templates import ...) is not present anywhere in SOURCE — it was an ad-hoc command Robert typed, not a templated snippet. No SOURCE fix needed; the guard's scope isspec-kitty …CLI snippets, not arbitrarypython -clines.
Drift tally (SOURCE, not generated copies)
- HARD drift: 12 references (all
doctrine list/doctrine show). - BEHAVIORAL drift: 3 references (1 missing-
--action, 2 setup-plan no-flag). - Total actionable: 15 across 3 SOURCE files (2 skills + 1 mission-step prompt).
- Worst offender:
src/doctrine/skills/spec-kitty-charter-doctrine/SKILL.md— 11 of the 12 HARD hits.
3. CI command-snippet guard design
3.1 Precedent — this is a generalization, not a greenfield build
tests/architectural/test_docs_cli_reference_parity.py already ships:
scripts.docs._typer_walker.walk(app)→list[CommandPathEntry]with.path(tuple of segments),.hidden,.deprecated. This is the live registered surface extractor.test_skill_docs_profile_subcommands_are_registered(FR-018) — already scans one SKILL.md forspec-kitty agent profile <sub>tokens via regex and asserts each<sub>is registered.
The #2007 guard is that exact pattern, widened from the single agent profile family to all
spec-kitty … snippets across all SOURCE skills/prompts/docs, with flag-level validation added.
That dramatically lowers risk: the hard part (live-surface introspection, the env-flag ordering at
import, the SaaS-gated path handling) is solved and battle-tested.
3.2 Extraction strategy
- Source set: glob SOURCE only —
src/doctrine/skills/**/*.md,src/doctrine/skills/**/references/*.md,src/doctrine/missions/mission-steps/**/*.md, and (a second, looser tier)docs/**/*.md. Excludedocs/plans/engineering-notes/**(archaeology/notes, not prescriptive) and all generated agent dirs (.claude/,.amazonq/, …). - Snippet capture: extract fenced
```bashblocks, then within them match lines beginning (after optionaluv run,$,#-prefixed comments stripped) withspec-kitty. Tokenize the command path greedily over leading lower-case/hyphen words until the first token that is a flag (-/--), a placeholder, an argument value, or end-of-line. - Path validation: resolve the longest token-prefix that matches a registered
CommandPathEntry.path(groups + leaf). If no registered command path is a prefix of the tokens → HARD failure withfile:line → spec-kitty <tokens>and the nearest registered path as a suggestion. - Flag validation (tier 2, opt-in per file): for matched leaf commands, collect
--flagtokens and check membership in that command's option set (read from the click command'sparams, including auto-negations of bool flags). Unknown--flag→ failure. Keep this behind an allow-list initially because it has the most false-positive surface.
3.3 False-positive risks and mitigations
- Placeholders:
<handle>,<mission-slug>,<a-directive-id>,…,[OPTIONS]. Mitigation: only validate the command path (lower-case/hyphen words); stop tokenizing at the first<,[,…,$,{, uppercase, or quote. Placeholders never appear in a command path position. - Bool-flag auto-negation:
--no-mark-loaded,--no-auto-commit,--no-worktrees. Mitigation: build the option set from click params and add the--no-<name>form for everyis_flag/bool option. - Reference-table prose:
docs/api/*.mdprint--mark-loaded --no-mark-loadedas help text, not invocations. Mitigation: only scan inside```bashfences; ignore tables /Usage:dumps. - Continuation lines & pipes:
\-continued commands and| jqtails. Mitigation: join\continuations; cut at the first|,&&,;,>. python -clines: out of scope by construction (onlyspec-kitty …lines are validated).- Deprecated-but-present commands:
walk()already flags.deprecated; treat deprecated paths as valid-but-warn so the guard doesn't fight a separate deprecation banner test.
3.4 Allow-list placement
A module-level frozenset in the test file (mirroring _SKILL_DOCS in the existing parity test):
_SNIPPET_DRIFT_ALLOWLIST: frozenset[tuple[str, tuple[str, ...]]] keyed by
(relative_file, command_path_tuple). Start empty (after the 15 SOURCE fixes land) so the guard
is a true ratchet. Any intentional pseudo-command in docs must be added explicitly with a comment — the
same discipline as the ratchet baselines in tests/architectural/_baselines.yaml.
3.5 Tractability and rough size — G3 DevEx enabler: YES
- Tractable: high confidence. All infrastructure (live walker, env-flag import ordering, regex token scan) exists; this is a widen + flag-check on a proven pattern.
- Size: ~120–180 LOC for the guard test + ~40–60 LOC of shared snippet-extraction helper
(worth hoisting into
scripts/docs/next to_typer_walker.pyso the docs-lint job can reuse it). Path-level validation is the must-have (catches all 12 HARD hits); flag-level validation is a follow-on tier (catches the missing---actionclass) and carries the false-positive budget. - Wiring: runs in
tests/architectural/(already collected by the docs-contract fail-on-drift gate per commit04f2497ab). No new CI job needed. - Caveat: the guard catches snippet drift (HARD + missing-flag). It does not catch behavioral drift like setup-plan's no-flag contract (#2007 item 4) — that needs a resolver fix + regression test, not a snippet check. Be explicit about that boundary in the epic so the guard isn't oversold as covering all of problem (1).