Implementation Plan: Glossary DRG Residence and Executor Chokepoint
Branch: main → main | Date: 2026-04-22 | Spec: spec.md Mission ID: 01KPTE0P5JVQFWESWV07R0XG4M Issue: #467 (Phase 5 Foundation)
Summary
Wire a deterministic, non-blocking glossary chokepoint into ProfileInvocationExecutor.invoke() that fires on every advise / ask / do invocation. Glossary terms are given stable glossary:<id> URNs; a runtime-computed in-memory layer (no persisted YAML, no operator sync step) materializes them as real DRGNode/DRGEdge/DRGGraph objects. The chokepoint tokenizes the request text, matches against the action-scoped term set using pure-Python suffix stripping, classifies conflicts via the existing SemanticConflict model, and returns a GlossaryObservationBundle attached to the InvocationPayload. High-severity findings surface inline to hosts; low/medium findings go to the JSONL trail only. Three work packages: (1) DRG node model + index builder, (2) chokepoint + executor integration + ADR-5 benchmarks, (3) severity routing + trail writes + host doc updates.
Technical Context
Language/Version: Python 3.11+ Primary Dependencies: doctrine.drg.models (DRGNode/DRGEdge/DRGGraph/NodeKind/Relation, existing), specify_cli.glossary. (GlossaryStore/TermSense/SemanticConflict, existing), specify_cli.invocation. (executor/writer, existing), hashlib + time (stdlib only — no new external dependencies) Storage: No new storage. Chokepoint observations appended as third JSONL event to existing per-invocation files at .kittify/events/profile-invocations/{invocation_id}.jsonl Testing: pytest + mypy --strict. ≥90% line coverage on new modules. Existing invocation e2e suite must pass unchanged. Target Platform: Linux / macOS (existing spec-kitty targets) Performance Goals: Chokepoint p95 ≤50ms on 2,000-word request + 500-term index; ≤2ms on ≤50-word request; index load ≤20ms Constraints: No LLM, no HTTP, no subprocess in hot path. Chokepoint failure never blocks invocation. mark_loaded=False invariant in build_charter_context() preserved. All new slots in InvocationPayload appear in to_dict() output.
Charter Check
Charter policies applicable to this plan:
| Policy | Applies | Disposition |
|---|---|---|
typer CLI framework | No — no new CLI commands in this tranche | ✓ |
ruamel.yaml for YAML parsing | No — no new YAML reads/writes | ✓ |
pytest + 90% coverage | Yes | All new modules get unit tests to ≥90% line coverage |
mypy --strict | Yes | All new .py files must pass with zero errors |
| Integration tests for CLI commands | Yes — executor is exercised by advise/ask/do | WP02 adds integration test; WP03 verifies e2e suite |
| DIRECTIVE_003 (Decision Documentation) | Yes — ADR-5 captures the p95 threshold decision | ADR-5 drafted in WP02 |
| DIRECTIVE_010 (Specification Fidelity) | Yes | Implementation traced against spec FRs in WP reviews |
No violations to justify.
Project Structure
Documentation
kitty-specs/glossary-drg-chokepoint-01KPTE0P/
├── spec.md # Approved specification
├── plan.md # This file
├── research.md # Phase 0 decisions (DRG model, URN derivation, lemmatization)
├── data-model.md # Phase 1 data model (GlossaryObservationBundle, GlossaryChokepoint, etc.)
└── tasks.md # Generated by /spec-kitty.tasks (not yet created)
Source Code
src/
├── doctrine/
│ └── drg/
│ └── models.py # MODIFY: add NodeKind.GLOSSARY = "glossary"
├── specify_cli/
│ ├── glossary/
│ │ ├── drg_builder.py # NEW: GlossaryDRGBuilder, build_glossary_drg_layer(), glossary_urn()
│ │ └── chokepoint.py # NEW: GlossaryChokepoint, GlossaryObservationBundle
│ └── invocation/
│ ├── executor.py # MODIFY: add chokepoint call, InvocationPayload.glossary_observations slot
│ └── writer.py # MODIFY: add write_glossary_observation() method
└── architecture/
└── adrs/
└── 2026-04-22-5-glossary-chokepoint-p95-measurement.md # NEW (ADR-5, drafted in WP02)
tests/
├── doctrine/
│ └── drg/
│ └── test_glossary_node_kind.py # NEW: NodeKind.GLOSSARY backward-compat tests
├── specify_cli/
│ ├── glossary/
│ │ ├── test_drg_builder.py # NEW: GlossaryDRGBuilder unit tests
│ │ └── test_chokepoint.py # NEW: GlossaryChokepoint unit tests (including failure mode)
│ └── invocation/
│ └── test_executor_glossary.py # NEW: executor integration tests (chokepoint wiring)
└── (existing invocation e2e suite) # Must pass unchanged after WP03
Work Package Definitions
WP01 — DRG Term Node Model and Index Builder
Goal: Establish the addressable glossary:<id> URN scheme and the in-memory builder that materializes glossary nodes/edges from the active GlossaryStore.
Scope: 1. Add NodeKind.GLOSSARY = "glossary" to src/doctrine/drg/models.py. Verify existing DRG YAML files load and validate without the new node kind present (backward-compat). 2. Implement glossary_urn(surface_text: str) -> str in src/specify_cli/glossary/drg_builder.py. Hash: hashlib.sha256(surface_text.encode()).hexdigest()[:8]. Log a warning on collision detection. 3. Implement build_glossary_drg_layer(store: GlossaryStore, applicable_scopes: set[GlossaryScope]) -> DRGGraph in drg_builder.py. Uses load_validated_graph() to get action node URNs from the shipped DRG; mints one DRGNode(kind=NodeKind.GLOSSARY) per active sense in applicable scopes; adds one DRGEdge(relation=Relation.VOCABULARY) per (action_urn → glossary_urn) pair. generated_by = "glossary-drg-builder-v1". 4. Implement build_index(store: GlossaryStore, applicable_scopes: set[GlossaryScope]) -> GlossaryTermIndex in drg_builder.py. Builds the surface_to_urn and surface_to_senses dicts from active senses. Also populates lemmatized-form aliases (suffix stripping) into surface_to_urn pointing to the canonical URN. 5. Implement the suffix-stripping normalizer as a module-level function _normalize(token: str) -> str in drg_builder.py. Rules: lowercase, strip punctuation, apply suffix list (-s, -es, -ing, -ed, -er, -ers, -tion, -tions, -ment, -ments) using regex; minimum stem length 3. 6. Unit tests in tests/doctrine/drg/test_glossary_node_kind.py: backward-compat DRG load, URN prefix validation, kind value string check. 7. Unit tests in tests/specify_cli/glossary/test_drg_builder.py: URN derivation (known inputs → known hashes), collision detection/warning, builder output shape (node count, edge count), index lookup, normalizer edge cases. 8. mypy --strict and ruff check pass on all new/modified files.
Acceptance criteria for WP01:
NodeKind.GLOSSARY.value == "glossary"✓glossary_urn("lane") == "glossary:d93244e7"(verified:sha256("lane".encode()).hexdigest()[:8]) ✓- A
DRGGraphreturned bybuild_glossary_drg_layercontains oneDRGNodeper unique canonical surface inapplicable_scopes, withkind = NodeKind.GLOSSARY✓ VOCABULARYedges exist from each shipped action node URN to each term node ✓- An existing
graph.yamlwithoutglossarynodes loads viaload_graph()without error ✓ build_index()returns an index whosesurface_to_urncontains both canonical and lemmatized forms ✓- All new tests pass;
mypy --strictreports zero errors ✓
No-touch boundary: GlossaryStore, TermSense, SemanticConflict, or any existing glossary module.
WP02 — GlossaryChokepoint and GlossaryObservationBundle
Depends on: WP01 (GlossaryTermIndex, build_index())
Goal: Implement GlossaryObservationBundle, GlossaryChokepoint, latency benchmarks, and ADR-5. No executor wiring in this WP — all executor.py and writer.py changes are in WP03.
Scope: 1. Implement GlossaryObservationBundle as a frozen dataclass in src/specify_cli/glossary/chokepoint.py. Fields: matched_urns: tuple[str, ...], high_severity: tuple[SemanticConflict, ...], all_conflicts: tuple[SemanticConflict, ...], tokens_checked: int, duration_ms: float, error_msg: str | None. Implement to_dict() -> dict[str, object] (JSONL-serializable form). 2. Implement GlossaryChokepoint in chokepoint.py. __init__(repo_root, applicable_scopes) — lazy, no I/O. run(request_text, invocation_id) -> GlossaryObservationBundle — tokenize, normalize, lookup, classify, return bundle. All exceptions caught inside run(). 3. Tokenizer in run(): split on whitespace and punctuation, lowercase, apply _normalize() from drg_builder.py, filter COMMON_WORDS from extraction.py, lookup in index.surface_to_urn. 4. Conflict classification: reuse the existing classifiers in specify_cli.glossary.conflict — do not implement a new one. For each matched term: construct ExtractedTerm(surface=surface, confidence=1.0, source="request_text"); call classify_conflict(term, senses); call score_severity(conflict_type, confidence=1.0, is_critical_step=False); call create_conflict(term, conflict_type, severity, candidate_senses, context="request_text"). Skip (no conflict added) when classify_conflict() returns None (single unambiguous active sense). URN is still added to matched_urns when the token matches, even with no conflict. 5. Benchmark: write tests/specify_cli/glossary/bench_chokepoint.py. Inputs: 500-term index, texts of 50/500/2000 words, 1000 iterations each. Record p95 per text size. 6. Draft architecture/adrs/2026-04-22-5-glossary-chokepoint-p95-measurement.md with benchmark results. 7. Unit tests in tests/specify_cli/glossary/test_chokepoint.py. Integration tests for the invoke() + payload wiring belong in WP03's test_executor_glossary.py.
Acceptance criteria for WP02:
GlossaryChokepoint.run()returnsGlossaryObservationBundlewitherror_msg=Noneon happy path ✓- Conflict classification uses
classify_conflict()/score_severity()/create_conflict()fromspecify_cli.glossary.conflict— no parallel severity model ✓ - Exception injection returns error-bundle without propagating ✓
- Benchmark p95 ≤50ms confirmed (or ADR-5 documents revised threshold) ✓
- All new tests pass;
mypy --strictandruff checkzero errors ✓
WP03 — Severity Routing, Trail Integration, and Host Guidance
Depends on: WP02 (GlossaryObservationBundle, InvocationPayload.glossary_observations)
Goal: Wire the observation bundle into the invocation trail, enforce the severity routing contract in code and docs, and verify the full e2e suite.
Scope: 1. Add write_glossary_observation(self, invocation_id: str, bundle: GlossaryObservationBundle) -> None to InvocationWriter in src/specify_cli/invocation/writer.py. Appends bundle.to_dict() as a JSON line to .kittify/events/profile-invocations/{invocation_id}.jsonl. Best-effort: all exceptions silently suppressed (consistent with _append_to_index()). Skip entirely for clean invocations: write only if bundle.all_conflicts or bundle.error_msg is not None. (The guard in code: if not bundle.all_conflicts and bundle.error_msg is None: return.) 2. Wire write_glossary_observation() into ProfileInvocationExecutor.invoke() immediately after write_started() (step 4 → step 5 in the sequence from data-model.md). 3. Identify existing Codex host guidance doc path (search for codex in docs/ or .agents/skills/). Update the relevant section to document:
4. Identify existing gstack host guidance doc path. Apply the same updates. 5. Update docs/trail-model.md to document the new "glossary_checked" event type under the Tier 1 section (it is appended to the same file as the started event, so it is effectively Tier 1 content). 6. Run the full existing invocation e2e test suite (tests/merge/test_profile_charter_e2e.py and any other invocation e2e tests). Confirm all pass without modification. 7. Add one e2e-style test in test_executor_glossary.py (or a new file) that exercises the full invoke() → write_glossary_observation() path and verifies the JSONL file for an invocation contains three lines: started, glossary_checked, and (after complete_invocation()) completed.
- The
glossary_observationsfield inInvocationPayloaddict - The
high_severityrendering contract: prepend inline text before governance context - Suggested inline format (from data-model.md)
- What to do when
error_msgis set (log warning, do not block)
Acceptance criteria for WP03:
- A
spec-kitty advisecall on a project with active glossary terms produces a.jsonlinvocation file with a"glossary_checked"event ✓ - A clean invocation (no conflicts, no errors) produces NO
"glossary_checked"event in the trail (to avoid noise) ✓ - The Codex and gstack host guidance docs contain the
glossary_observationsfield description and rendering contract ✓ docs/trail-model.mddocuments the"glossary_checked"event type ✓- Existing invocation e2e tests pass unchanged ✓
mypy --strictandruff checkpass on all modified files ✓
Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| p95 benchmark exceeds 50ms on 2,000-word + 500-term inputs | Low-medium | Medium | WP02 benchmarks early; ADR-5 documents revised threshold if needed; worst-case is raising the threshold, not blocking the feature |
NodeKind.GLOSSARY = "glossary" naming breaks StrEnum convention | Low | Low | Convention check only — no runtime impact. The value "glossary" is unique and unambiguous |
Collision in glossary:<id> URN derivation | Very low | Low | WP01 includes collision detection with logged warning; fallback is retaining the first term (predictable behavior) |
Chokepoint adds unexpected import cycle between glossary and invocation packages | Low | Medium | Use lazy import inside invoke() method body; WP02 verifies with ruff check (which catches circular imports in most cases) |
write_glossary_observation() silently fails on filesystem error | Acceptable | Low | Trail write is best-effort (matches existing _append_to_index() pattern); invocation succeeds regardless |
Existing e2e tests break because InvocationPayload.to_dict() now includes glossary_observations | Low | Low | Tests that assert exact dict contents will need to add the new key; tests that use dict.get() or only check specific keys are unaffected. WP03 discovers and fixes these |
Branch Contract (final)
Current branch at plan start: main Planning/base branch: main Final merge target: main Strategy: Three sequential work packages on main (or in a feature branch if the implementer prefers a PR-per-WP workflow). Each WP must pass mypy --strict, ruff check, and pytest before the next WP begins.