Glossary DRG Residence and Executor Chokepoint

Phase 5 Foundation — Issue #467 Mission ID: 01KPTE0P5JVQFWESWV07R0XG4M Mission Slug: glossary-drg-chokepoint-01KPTE0P Target Branch: main Baseline: spec-kitty origin/main @ 2144544b (3.2.0a5, 2026-04-22)


Problem / Opportunity

Spec Kitty's governance system already validates terminology at mission-primitive time (the existing execute_with_glossary hook in doctrine.missions.glossary_hook). But the newer profile-invocation path — spec-kitty advise, ask, and do — bypasses this check entirely. When a host LLM issues an invocation, it receives governance context assembled from the Doctrine Reference Graph, but no signal about whether the request text itself contains terms that conflict with the project glossary.

This gap means:

  • Term drift in agent requests goes undetected until a human reviewer catches it.
  • The DRG has a GLOSSARY_SCOPE node kind and VOCABULARY relation already declared, but no individual glossary term nodes to back them — the DRG graph currently has scope-level placeholders, not per-term addressable nodes.
  • Hosts have no standard contract for what glossary observations to surface inline versus log quietly.

Phase 5 establishes the foundation: stable URN-addressed term nodes in the DRG, typed edges from action nodes to their applicable terms, and a deterministic chokepoint wired directly into ProfileInvocationExecutor that fires on every invocation.


Goal

Make every active glossary term a stable, DRG-addressable node (glossary:<id> URN). Wire typed vocabulary edges from action nodes to applicable term nodes. Integrate a deterministic, non-blocking glossary chokepoint into ProfileInvocationExecutor.invoke() that returns a structured observation bundle to the host on every advise / ask / do invocation.


User Scenarios and Testing

Scenario 1 — Invocation with no glossary conflict (golden path)

A project operator runs spec-kitty advise "summarise the WP status". The executor resolves the profile, assembles governance context, and runs the chokepoint against the active glossary terms for the resolved action. No conflicts are found. The InvocationPayload is returned with an empty glossary_observations.all_conflicts tuple. The host renders governance context normally — no inline glossary text appears. No glossary_checked event is written to the JSONL trail (clean invocations produce no trail noise).

Test: Assert that payload.glossary_observations.all_conflicts == () and payload.glossary_observations.high_severity == () and that payload.glossary_observations is present (not None).

Scenario 2 — High-severity conflict surfaced inline

A project has a glossary term "lane" (Spec Kitty meaning: parallel execution slot) that conflicts with informal usage. An operator issues spec-kitty do "move the WP to a new lane (channel)". The chokepoint detects SemanticConflict(severity=HIGH, conflict_type=INCONSISTENT) for "lane". The InvocationPayload.glossary_observations.high_severity list contains the conflict. The host (Codex or gstack) reads the bundle and prepends an inline warning before presenting governance context to the LLM.

Test: Assert that payload.glossary_observations.high_severity contains the conflict record, and that the JSONL trail entry also contains it.

Scenario 3 — Low/medium conflict logged only

An operator issues spec-kitty ask planner "what tasks are planned for the current sprint". The word "sprint" has a medium-severity ambiguity conflict (Agile sprint vs. casual use). The chokepoint classifies it as medium. The InvocationPayload.glossary_observations.high_severity is empty; the conflict appears only in the JSONL trail entry for this invocation.

Test: Assert payload.glossary_observations.high_severity == [] and that the trail JSONL for the invocation contains the medium-severity conflict.

Scenario 4 — Chokepoint failure does not block invocation

A bug in the index builder (e.g., corrupt DRG YAML) causes the chokepoint to raise an exception. The executor catches it, attaches GlossaryObservationBundle(error_msg="<description>", high_severity=[], conflicts=[]) to the payload, logs a warning, and returns the payload normally. The invocation completes; the host receives governance context without glossary observations.

Test: Assert that the executor completes without propagating the exception, that payload.glossary_observations.error_msg is non-empty, and that payload.invocation_id is still valid.

Scenario 5 — Term index is rebuilt from DRG, no operator step required

After a new term is added to a seed file and the DRG is regenerated, the chokepoint's lazy index loader picks up the new term on the next invocation without any manual cache invalidation command from the operator.

Test: Assert that after adding a glossary:<id> node and a vocabulary edge to a test DRG and invalidating the index, the next chokepoint call finds the new term.


Actors

ActorRole
ProfileInvocationExecutorRuntime-internal actor that runs the chokepoint synchronously during every profile invocation
Glossary term author (operator)Maintains the seed files that populate the glossary store; indirectly controls which terms appear in the DRG
Host (Codex, gstack)External LLM harness that reads InvocationPayload and decides how to surface high-severity observations inline
DRG graphPassive data artifact whose glossary:<id> nodes and vocabulary edges drive the chokepoint's term index
Invocation trail (JSONL)Passive sink for all chokepoint observations, including low/medium conflicts

Functional Requirements

IDRequirementStatus
FR-001Every active glossary term in the glossary store must have a glossary:<id> URN address stable across store rebuilds for the same canonical surface. The URN format and corresponding DRGNode representation are defined by build_glossary_drg_layer() in src/specify_cli/glossary/drg_builder.py. In the live invocation path, the chokepoint resolves terms via GlossaryTermIndex built directly from GlossaryStore (not from a persisted DRG YAML), consistent with the planning decision to use a runtime-computed in-memory layer.Approved
FR-002Each addressable glossary term must carry its canonical surface form as its label and be identified by NodeKind.GLOSSARY (value "glossary", matching the glossary: URN prefix).Approved
FR-003The glossary DRG layer (build_glossary_drg_layer()) must produce vocabulary edges from every shipped action node to every term node in the applicable scopes. In the live chokepoint path, spec_kitty_core and team_domain terms are applied to all invocations regardless of the specific action URN (broad v1 applicability). Per-action scoping is deferred to a follow-on.Approved
FR-004The term index must be queryable by the chokepoint. In v1, the chokepoint applies all applicable-scope terms to every invocation uniformly (no per-action-URN filtering). A DRG-native action-URN → term-node query is deferred to a follow-on tranche.Approved
FR-005ProfileInvocationExecutor.invoke() must run the glossary chokepoint synchronously after governance-context assembly and before returning InvocationPayload.Approved
FR-006The chokepoint must tokenize the request text and match tokens against the action-scoped term set using deterministic string matching and lemmatization only. No LLM calls are permitted in the hot path.Approved
FR-007For each matched term, the chokepoint must classify any drift using the existing SemanticConflict model (conflict_type, severity, confidence).Approved
FR-008The chokepoint result must be encapsulated in a GlossaryObservationBundle and attached to the InvocationPayload returned from invoke(). The bundle must always be present (never None), even on a clean invocation.Approved
FR-009GlossaryObservationBundle must include: the list of matched term URNs, the list of high-severity SemanticConflict findings (surfaced to hosts), the list of all other findings (for trail-only writing), the count of tokens checked, and the chokepoint execution duration in milliseconds.Approved
FR-010If the chokepoint raises any exception, the executor must catch it, emit a warning-level log entry, attach a GlossaryObservationBundle with error_msg set and empty conflict lists, and return the payload normally. The invocation is never blocked or interrupted by chokepoint failure.Approved
FR-011The host contract for high-severity conflicts is: the host must render the high_severity conflict list as inline text in agent output before presenting the governance context to the LLM.Approved
FR-012Low- and medium-severity findings must be written to the local invocation JSONL trail under the invocation's trail entry, but must not appear in InvocationPayload.glossary_observations.high_severity.Approved
FR-013The GlossaryChokepoint class must be instantiatable without triggering any filesystem I/O. The term index must be lazily loaded on first use and cached for the lifetime of the executor instance.Approved
FR-014The Codex and gstack host guidance documents must be updated to describe the glossary_observations field, the high_severity rendering contract, and the expected behavior when error_msg is set.Approved
FR-015The term index must be rebuildable on demand via build_index(), which scans the active GlossaryStore for ACTIVE senses in the applicable scopes. No manual operator step is required after glossary seed files change.Approved

Non-Functional Requirements

IDRequirementThresholdStatus
NFR-001Chokepoint end-to-end latency on a request text up to 2,000 words with a term index of up to 500 terms.p95 ≤ 50msApproved
NFR-002Chokepoint overhead on a one-liner request text (≤50 words).p95 ≤ 2msApproved
NFR-003Term index initial load from a DRG containing up to 500 glossary term nodes.≤ 20msApproved
NFR-004Unit test coverage for the GlossaryChokepoint class and GlossaryObservationBundle model.≥ 90% line coverageApproved
NFR-005Static type checking gate for all new source files.mypy --strict zero errorsApproved

Constraints

IDConstraintStatus
C-001The chokepoint must never block or propagate an exception from within ProfileInvocationExecutor.invoke(). All exceptions from the chokepoint code path must be caught by the executor and result in an empty-bundle payload.Approved
C-002No LLM calls, HTTP requests, subprocess invocations, or blocking I/O operations are permitted inside the chokepoint hot path. Deterministic string matching and lemmatization only.Approved
C-003The NodeKind.GLOSSARY extension must be additive to the DRG schema. Existing graph.yaml files that contain no glossary nodes must still load and validate successfully without migration.Approved
C-004The existing GlossaryStore, GlossaryScope, TermSense, and SemanticConflict models must not be modified in a breaking way. Any new fields must be additive.Approved
C-005InvocationPayload.__slots__ must be extended without breaking existing callers of to_dict(). The new glossary_observations slot must appear in the dict output.Approved
C-006WP5.4 (dashboard glossary tile), WP5.5 (glossary entity pages, #532), and WP5.6 (spec-kitty charter lint, #533) are out of scope for this tranche and must not be implemented.Approved
C-007Host LLM and harness own reading and generation. Spec Kitty owns routing, governance context assembly, glossary drift detection, validation, trail writing, and additive propagation.Approved
C-008mark_loaded=False must continue to be passed to build_charter_context() in ProfileInvocationExecutor.invoke(). The chokepoint must not alter this invariant.Approved

Success Criteria

IDCriterion
SC-001A spec-kitty advise / ask / do invocation on a project with active glossary terms completes and returns an InvocationPayload that includes a non-null glossary_observations bundle within the p95 ≤ 50ms chokepoint budget.
SC-002A high-severity SemanticConflict detected by the chokepoint appears as inline text in agent output for the Codex host path and the gstack host path, with no user configuration required.
SC-003When the chokepoint raises a simulated exception in tests, the invocation completes with a valid invocation_id and an error_msg-populated bundle — no exception escapes to the caller.
SC-004After DRG regeneration with a new glossary:<id> node and vocabulary edge, the next invocation's chokepoint finds the new term without any manual operator action.
SC-005All new source files pass mypy --strict and ruff check. New lines achieve ≥ 90% test coverage in the unit suite.
SC-006The existing invocation e2e test suite passes unchanged after this tranche lands.

Key Entities

EntityDescription
GlossaryObservationBundleNew data model returned by the chokepoint: matched term URNs, high-severity conflicts (surfaced to hosts), all conflicts (for trail), token count, duration, optional error_msg.
GlossaryChokepointNew service class: accepts an action-scoped term set, tokenizes request text, matches terms, classifies conflicts, returns a GlossaryObservationBundle. Stateless except for the lazily loaded term index.
GlossaryTermIndexInternal index structure built by scanning DRG glossary:<id> nodes and vocabulary edges. Cached per executor instance. Rebuildable without operator action.
NodeKind.GLOSSARYNew enum value (= "glossary") added to the existing NodeKind StrEnum in doctrine.drg.models. The string value "glossary" governs the glossary: URN prefix via the existing DRG URN validator (prefix == kind.value).
glossary:<id> nodeA DRG node representing one canonical glossary term. The <id> is the first 8 hex chars of sha256(surface_text, utf-8).
vocabulary edgeAn existing Relation.VOCABULARY edge in the DRG connecting action nodes to glossary:<id> term nodes.
Action-scoped term setThe set of all glossary:<id> nodes reachable from a given action URN via outbound vocabulary edges.
Invocation trail entryThe per-invocation JSONL record that receives all chokepoint observations, including low/medium conflicts not surfaced inline.

Assumptions

#Assumption
A-1The p95 ≤ 50ms performance target is achievable for request texts up to 2,000 words and term indexes up to 500 terms using pure Python string matching and lemmatization. This will be validated in WP02 benchmarks; if not achievable, ADR-5 will be opened to revise the threshold.
A-2mission_local and audience_domain scoped terms are excluded from static DRG vocabulary edges in this tranche; they may be injected at runtime in a follow-on. This is acceptable for the Phase 5 foundation.
A-3The <id> segment in glossary:<id> URNs will be derived as a short stable hash of the canonical surface form (lowercased, trimmed). Collision probability is negligible for realistic glossary sizes (< 10,000 terms).
A-4The Codex and gstack host guidance updates are lightweight doc changes only; no new commands or API surfaces are required in those host codebases for this tranche.

Non-Goals and Deferred Follow-Ons

ItemDisposition
WP5.4: Dashboard glossary tileDeferred — future tranche
WP5.5: Glossary entity pages with two-way backlinks (issue #532)Deferred — future tranche
WP5.6: spec-kitty charter lint graph-native decay detection (issue #533)Deferred — future tranche
spec-kitty explain (issue #534)Explicitly excluded from Phase 5 scope
Mission rewrite / retrospective contract (issue #468)Out of scope
Versioning + migration hardening beyond additive backward-compat (issue #469)Out of scope
mission_local / audience_domain terms in static DRG edgesDeferred — follow-on; runtime injection pattern to be designed separately
SaaS projection of glossary observationsDeferred — Tier 2 / Tier 3 propagation patterns exist but glossary bundle projection not in scope here
Entity-level graph lint, orphan detectionDeferred — WP5.6 territory
ADR-5 (formal p95 measurement record)To be drafted and published as part of WP02 benchmarking

Domain Language

Canonical termDefinitionAvoid
DRGDoctrine Reference Graph — the typed, URN-addressed graph of doctrine artifacts"doctrine graph", "doctrine graph YAML", "graph.yaml" (as a concept)
chokepointThe synchronous middleware step in ProfileInvocationExecutor.invoke() that runs glossary checking"filter", "validator", "gate", "hook" (when referring specifically to the executor-level integration)
observation bundle / GlossaryObservationBundleThe structured result returned by the chokepoint to the executor"report", "findings", "result dict"
term node / glossary:<id> nodeA DRG node representing one canonical glossary term"glossary entry", "vocabulary item", "term record"
vocabulary edgeA Relation.VOCABULARY typed DRG edge from an action or profile node to a term node"link", "association", "connection", "glossary edge"
action-scoped term setAll glossary:<id> nodes reachable from a given action node via vocabulary edges"relevant terms", "applicable glossary", "term list for action"
hostThe external LLM harness (Codex, gstack) that reads InvocationPayload"client", "caller", "LLM" (when referring to the harness specifically)
invocation trailThe local JSONL log of invocation events; receives all chokepoint observations"audit log", "event log", "invocation log" (use "trail" per docs/trail-model.md)

Minimum Implementation Sequence

Once this spec is approved, the following WP sequence delivers the smallest complete slice:

1. WP01 — DRG term node model and index builder Add NodeKind.GLOSSARY = "glossary". Implement glossary_urn() (SHA-256 8-hex stable URN), build_glossary_drg_layer() (produces DRGNode/DRGEdge objects for the full layer), build_index() (builds GlossaryTermIndex directly from GlossaryStore for the live chokepoint path), and _normalize() (suffix-stripping lemmatizer). Update DRG loader and validator for backward-compat (no glossary nodes in existing YAML = no error). Unit tests + mypy.

2. WP02 — Chokepoint class, observation bundle, and executor integration Implement GlossaryObservationBundle model. Implement GlossaryChokepoint with lazy index load, deterministic tokenizer + matcher, conflict classification via existing SemanticConflict. Wire into ProfileInvocationExecutor.invoke() with try/except safety wrapper. Benchmark chokepoint latency against p95 targets; draft ADR-5 with measurement data. Extend InvocationPayload.__slots__ with glossary_observations. Unit + integration tests + mypy.

3. WP03 — Observation surface and host guidance Define severity-routing contract in code (highhigh_severity list in bundle; low/medium → trail JSONL only). Write chokepoint observation to the invocation trail entry. Update Codex host guidance doc. Update gstack host guidance doc. Invocation e2e tests to verify existing suite still passes. Verify SC-002 manually or via stub host test.