Spec Kitty

└─ kitty-specs
   └─ Glossary DRG Residence and Executor Chokepoint

Mission Run:

📚 Docs ↗

Glossary DRG Residence and Executor Chokepoint

Phase 5 Foundation — Issue #467 Mission ID: 01KPTE0P5JVQFWESWV07R0XG4M Mission Slug: glossary-drg-chokepoint-01KPTE0P Target Branch: main Baseline: spec-kitty origin/main @ 2144544b (3.2.0a5, 2026-04-22)

Problem / Opportunity

Spec Kitty's governance system already validates terminology at mission-primitive time (the existing execute_with_glossary hook in doctrine.missions.glossary_hook). But the newer profile-invocation path — spec-kitty advise, ask, and do — bypasses this check entirely. When a host LLM issues an invocation, it receives governance context assembled from the Doctrine Reference Graph, but no signal about whether the request text itself contains terms that conflict with the project glossary.

This gap means:

Term drift in agent requests goes undetected until a human reviewer catches it.
The DRG has a GLOSSARY_SCOPE node kind and VOCABULARY relation already declared, but no individual glossary term nodes to back them — the DRG graph currently has scope-level placeholders, not per-term addressable nodes.
Hosts have no standard contract for what glossary observations to surface inline versus log quietly.

Phase 5 establishes the foundation: stable URN-addressed term nodes in the DRG, typed edges from action nodes to their applicable terms, and a deterministic chokepoint wired directly into ProfileInvocationExecutor that fires on every invocation.

Goal

Make every active glossary term a stable, DRG-addressable node (glossary:<id> URN). Wire typed vocabulary edges from action nodes to applicable term nodes. Integrate a deterministic, non-blocking glossary chokepoint into ProfileInvocationExecutor.invoke() that returns a structured observation bundle to the host on every advise / ask / do invocation.

User Scenarios and Testing

Scenario 1 — Invocation with no glossary conflict (golden path)

A project operator runs spec-kitty advise "summarise the WP status". The executor resolves the profile, assembles governance context, and runs the chokepoint against the active glossary terms for the resolved action. No conflicts are found. The InvocationPayload is returned with an empty glossary_observations.all_conflicts tuple. The host renders governance context normally — no inline glossary text appears. No glossary_checked event is written to the JSONL trail (clean invocations produce no trail noise).

Test: Assert that payload.glossary_observations.all_conflicts == () and payload.glossary_observations.high_severity == () and that payload.glossary_observations is present (not None).

Scenario 2 — High-severity conflict surfaced inline

A project has a glossary term "lane" (Spec Kitty meaning: parallel execution slot) that conflicts with informal usage. An operator issues spec-kitty do "move the WP to a new lane (channel)". The chokepoint detects SemanticConflict(severity=HIGH, conflict_type=INCONSISTENT) for "lane". The InvocationPayload.glossary_observations.high_severity list contains the conflict. The host (Codex or gstack) reads the bundle and prepends an inline warning before presenting governance context to the LLM.

Test: Assert that payload.glossary_observations.high_severity contains the conflict record, and that the JSONL trail entry also contains it.

Scenario 3 — Low/medium conflict logged only

An operator issues spec-kitty ask planner "what tasks are planned for the current sprint". The word "sprint" has a medium-severity ambiguity conflict (Agile sprint vs. casual use). The chokepoint classifies it as medium. The InvocationPayload.glossary_observations.high_severity is empty; the conflict appears only in the JSONL trail entry for this invocation.

Test: Assert payload.glossary_observations.high_severity == [] and that the trail JSONL for the invocation contains the medium-severity conflict.

Scenario 4 — Chokepoint failure does not block invocation

A bug in the index builder (e.g., corrupt DRG YAML) causes the chokepoint to raise an exception. The executor catches it, attaches GlossaryObservationBundle(error_msg="<description>", high_severity=[], conflicts=[]) to the payload, logs a warning, and returns the payload normally. The invocation completes; the host receives governance context without glossary observations.

Test: Assert that the executor completes without propagating the exception, that payload.glossary_observations.error_msg is non-empty, and that payload.invocation_id is still valid.

Scenario 5 — Term index is rebuilt from DRG, no operator step required

After a new term is added to a seed file and the DRG is regenerated, the chokepoint's lazy index loader picks up the new term on the next invocation without any manual cache invalidation command from the operator.

Test: Assert that after adding a glossary:<id> node and a vocabulary edge to a test DRG and invalidating the index, the next chokepoint call finds the new term.

Actors

Actor	Role
ProfileInvocationExecutor	Runtime-internal actor that runs the chokepoint synchronously during every profile invocation
Glossary term author (operator)	Maintains the seed files that populate the glossary store; indirectly controls which terms appear in the DRG
Host (Codex, gstack)	External LLM harness that reads `InvocationPayload` and decides how to surface high-severity observations inline
DRG graph	Passive data artifact whose `glossary:<id>` nodes and `vocabulary` edges drive the chokepoint's term index
Invocation trail (JSONL)	Passive sink for all chokepoint observations, including low/medium conflicts

Functional Requirements

ID	Requirement	Status
FR-001	Every active glossary term in the glossary store must have a `glossary:<id>` URN address stable across store rebuilds for the same canonical surface. The URN format and corresponding `DRGNode` representation are defined by `build_glossary_drg_layer()` in `src/specify_cli/glossary/drg_builder.py`. In the live invocation path, the chokepoint resolves terms via `GlossaryTermIndex` built directly from `GlossaryStore` (not from a persisted DRG YAML), consistent with the planning decision to use a runtime-computed in-memory layer.	Approved
FR-002	Each addressable glossary term must carry its canonical surface form as its `label` and be identified by `NodeKind.GLOSSARY` (value `"glossary"`, matching the `glossary:` URN prefix).	Approved
FR-003	The glossary DRG layer (`build_glossary_drg_layer()`) must produce `vocabulary` edges from every shipped action node to every term node in the applicable scopes. In the live chokepoint path, `spec_kitty_core` and `team_domain` terms are applied to all invocations regardless of the specific action URN (broad v1 applicability). Per-action scoping is deferred to a follow-on.	Approved
FR-004	The term index must be queryable by the chokepoint. In v1, the chokepoint applies all applicable-scope terms to every invocation uniformly (no per-action-URN filtering). A DRG-native action-URN → term-node query is deferred to a follow-on tranche.	Approved
FR-005	`ProfileInvocationExecutor.invoke()` must run the glossary chokepoint synchronously after governance-context assembly and before returning `InvocationPayload`.	Approved
FR-006	The chokepoint must tokenize the request text and match tokens against the action-scoped term set using deterministic string matching and lemmatization only. No LLM calls are permitted in the hot path.	Approved
FR-007	For each matched term, the chokepoint must classify any drift using the existing `SemanticConflict` model (`conflict_type`, `severity`, `confidence`).	Approved
FR-008	The chokepoint result must be encapsulated in a `GlossaryObservationBundle` and attached to the `InvocationPayload` returned from `invoke()`. The bundle must always be present (never `None`), even on a clean invocation.	Approved
FR-009	`GlossaryObservationBundle` must include: the list of matched term URNs, the list of high-severity `SemanticConflict` findings (surfaced to hosts), the list of all other findings (for trail-only writing), the count of tokens checked, and the chokepoint execution duration in milliseconds.	Approved
FR-010	If the chokepoint raises any exception, the executor must catch it, emit a warning-level log entry, attach a `GlossaryObservationBundle` with `error_msg` set and empty conflict lists, and return the payload normally. The invocation is never blocked or interrupted by chokepoint failure.	Approved
FR-011	The host contract for high-severity conflicts is: the host must render the `high_severity` conflict list as inline text in agent output before presenting the governance context to the LLM.	Approved
FR-012	Low- and medium-severity findings must be written to the local invocation JSONL trail under the invocation's trail entry, but must not appear in `InvocationPayload.glossary_observations.high_severity`.	Approved
FR-013	The `GlossaryChokepoint` class must be instantiatable without triggering any filesystem I/O. The term index must be lazily loaded on first use and cached for the lifetime of the executor instance.	Approved
FR-014	The Codex and gstack host guidance documents must be updated to describe the `glossary_observations` field, the `high_severity` rendering contract, and the expected behavior when `error_msg` is set.	Approved
FR-015	The term index must be rebuildable on demand via `build_index()`, which scans the active `GlossaryStore` for `ACTIVE` senses in the applicable scopes. No manual operator step is required after glossary seed files change.	Approved

Non-Functional Requirements

ID	Requirement	Threshold	Status
NFR-001	Chokepoint end-to-end latency on a request text up to 2,000 words with a term index of up to 500 terms.	p95 ≤ 50ms	Approved
NFR-002	Chokepoint overhead on a one-liner request text (≤50 words).	p95 ≤ 2ms	Approved
NFR-003	Term index initial load from a DRG containing up to 500 glossary term nodes.	≤ 20ms	Approved
NFR-004	Unit test coverage for the `GlossaryChokepoint` class and `GlossaryObservationBundle` model.	≥ 90% line coverage	Approved
NFR-005	Static type checking gate for all new source files.	`mypy --strict` zero errors	Approved

Constraints

ID	Constraint	Status
C-001	The chokepoint must never block or propagate an exception from within `ProfileInvocationExecutor.invoke()`. All exceptions from the chokepoint code path must be caught by the executor and result in an empty-bundle payload.	Approved
C-002	No LLM calls, HTTP requests, subprocess invocations, or blocking I/O operations are permitted inside the chokepoint hot path. Deterministic string matching and lemmatization only.	Approved
C-003	The `NodeKind.GLOSSARY` extension must be additive to the DRG schema. Existing `graph.yaml` files that contain no `glossary` nodes must still load and validate successfully without migration.	Approved
C-004	The existing `GlossaryStore`, `GlossaryScope`, `TermSense`, and `SemanticConflict` models must not be modified in a breaking way. Any new fields must be additive.	Approved
C-005	`InvocationPayload.__slots__` must be extended without breaking existing callers of `to_dict()`. The new `glossary_observations` slot must appear in the dict output.	Approved
C-006	WP5.4 (dashboard glossary tile), WP5.5 (glossary entity pages, #532), and WP5.6 (`spec-kitty charter lint`, #533) are out of scope for this tranche and must not be implemented.	Approved
C-007	Host LLM and harness own reading and generation. Spec Kitty owns routing, governance context assembly, glossary drift detection, validation, trail writing, and additive propagation.	Approved
C-008	`mark_loaded=False` must continue to be passed to `build_charter_context()` in `ProfileInvocationExecutor.invoke()`. The chokepoint must not alter this invariant.	Approved

Success Criteria

ID	Criterion
SC-001	A `spec-kitty advise` / `ask` / `do` invocation on a project with active glossary terms completes and returns an `InvocationPayload` that includes a non-null `glossary_observations` bundle within the p95 ≤ 50ms chokepoint budget.
SC-002	A high-severity `SemanticConflict` detected by the chokepoint appears as inline text in agent output for the Codex host path and the gstack host path, with no user configuration required.
SC-003	When the chokepoint raises a simulated exception in tests, the invocation completes with a valid `invocation_id` and an `error_msg`-populated bundle — no exception escapes to the caller.
SC-004	After DRG regeneration with a new `glossary:<id>` node and `vocabulary` edge, the next invocation's chokepoint finds the new term without any manual operator action.
SC-005	All new source files pass `mypy --strict` and `ruff check`. New lines achieve ≥ 90% test coverage in the unit suite.
SC-006	The existing invocation e2e test suite passes unchanged after this tranche lands.

Key Entities

Entity	Description
`GlossaryObservationBundle`	New data model returned by the chokepoint: matched term URNs, high-severity conflicts (surfaced to hosts), all conflicts (for trail), token count, duration, optional `error_msg`.
`GlossaryChokepoint`	New service class: accepts an action-scoped term set, tokenizes request text, matches terms, classifies conflicts, returns a `GlossaryObservationBundle`. Stateless except for the lazily loaded term index.
`GlossaryTermIndex`	Internal index structure built by scanning DRG `glossary:<id>` nodes and `vocabulary` edges. Cached per executor instance. Rebuildable without operator action.
`NodeKind.GLOSSARY`	New enum value (`= "glossary"`) added to the existing `NodeKind` StrEnum in `doctrine.drg.models`. The string value `"glossary"` governs the `glossary:` URN prefix via the existing DRG URN validator (`prefix == kind.value`).
`glossary:<id>` node	A DRG node representing one canonical glossary term. The `<id>` is the first 8 hex chars of `sha256(surface_text, utf-8)`.
`vocabulary` edge	An existing `Relation.VOCABULARY` edge in the DRG connecting action nodes to `glossary:<id>` term nodes.
Action-scoped term set	The set of all `glossary:<id>` nodes reachable from a given action URN via outbound `vocabulary` edges.
Invocation trail entry	The per-invocation JSONL record that receives all chokepoint observations, including low/medium conflicts not surfaced inline.

Assumptions

#	Assumption
A-1	The p95 ≤ 50ms performance target is achievable for request texts up to 2,000 words and term indexes up to 500 terms using pure Python string matching and lemmatization. This will be validated in WP02 benchmarks; if not achievable, ADR-5 will be opened to revise the threshold.
A-2	`mission_local` and `audience_domain` scoped terms are excluded from static DRG `vocabulary` edges in this tranche; they may be injected at runtime in a follow-on. This is acceptable for the Phase 5 foundation.
A-3	The `<id>` segment in `glossary:<id>` URNs will be derived as a short stable hash of the canonical surface form (lowercased, trimmed). Collision probability is negligible for realistic glossary sizes (< 10,000 terms).
A-4	The Codex and gstack host guidance updates are lightweight doc changes only; no new commands or API surfaces are required in those host codebases for this tranche.

Non-Goals and Deferred Follow-Ons

Item	Disposition
WP5.4: Dashboard glossary tile	Deferred — future tranche
WP5.5: Glossary entity pages with two-way backlinks (issue #532)	Deferred — future tranche
WP5.6: `spec-kitty charter lint` graph-native decay detection (issue #533)	Deferred — future tranche
`spec-kitty explain` (issue #534)	Explicitly excluded from Phase 5 scope
Mission rewrite / retrospective contract (issue #468)	Out of scope
Versioning + migration hardening beyond additive backward-compat (issue #469)	Out of scope
`mission_local` / `audience_domain` terms in static DRG edges	Deferred — follow-on; runtime injection pattern to be designed separately
SaaS projection of glossary observations	Deferred — Tier 2 / Tier 3 propagation patterns exist but glossary bundle projection not in scope here
Entity-level graph lint, orphan detection	Deferred — WP5.6 territory
ADR-5 (formal p95 measurement record)	To be drafted and published as part of WP02 benchmarking

Domain Language

Canonical term	Definition	Avoid
DRG	Doctrine Reference Graph — the typed, URN-addressed graph of doctrine artifacts	"doctrine graph", "doctrine graph YAML", "graph.yaml" (as a concept)
chokepoint	The synchronous middleware step in `ProfileInvocationExecutor.invoke()` that runs glossary checking	"filter", "validator", "gate", "hook" (when referring specifically to the executor-level integration)
observation bundle / `GlossaryObservationBundle`	The structured result returned by the chokepoint to the executor	"report", "findings", "result dict"
term node / `glossary:<id>` node	A DRG node representing one canonical glossary term	"glossary entry", "vocabulary item", "term record"
vocabulary edge	A `Relation.VOCABULARY` typed DRG edge from an action or profile node to a term node	"link", "association", "connection", "glossary edge"
action-scoped term set	All `glossary:<id>` nodes reachable from a given action node via `vocabulary` edges	"relevant terms", "applicable glossary", "term list for action"
host	The external LLM harness (Codex, gstack) that reads `InvocationPayload`	"client", "caller", "LLM" (when referring to the harness specifically)
invocation trail	The local JSONL log of invocation events; receives all chokepoint observations	"audit log", "event log", "invocation log" (use "trail" per `docs/trail-model.md`)

Minimum Implementation Sequence

Once this spec is approved, the following WP sequence delivers the smallest complete slice:

1. WP01 — DRG term node model and index builder Add NodeKind.GLOSSARY = "glossary". Implement glossary_urn() (SHA-256 8-hex stable URN), build_glossary_drg_layer() (produces DRGNode/DRGEdge objects for the full layer), build_index() (builds GlossaryTermIndex directly from GlossaryStore for the live chokepoint path), and _normalize() (suffix-stripping lemmatizer). Update DRG loader and validator for backward-compat (no glossary nodes in existing YAML = no error). Unit tests + mypy.

2. WP02 — Chokepoint class, observation bundle, and executor integration Implement GlossaryObservationBundle model. Implement GlossaryChokepoint with lazy index load, deterministic tokenizer + matcher, conflict classification via existing SemanticConflict. Wire into ProfileInvocationExecutor.invoke() with try/except safety wrapper. Benchmark chokepoint latency against p95 targets; draft ADR-5 with measurement data. Extend InvocationPayload.__slots__ with glossary_observations. Unit + integration tests + mypy.

3. WP03 — Observation surface and host guidance Define severity-routing contract in code (high → high_severity list in bundle; low/medium → trail JSONL only). Write chokepoint observation to the invocation trail entry. Update Codex host guidance doc. Update gstack host guidance doc. Invocation e2e tests to verify existing suite still passes. Verify SC-002 manually or via stub host test.