Implementation Plan: Glossary DRG Residence and Executor Chokepoint

Branch: mainmain | Date: 2026-04-22 | Spec: spec.md Mission ID: 01KPTE0P5JVQFWESWV07R0XG4M Issue: #467 (Phase 5 Foundation)


Summary

Wire a deterministic, non-blocking glossary chokepoint into ProfileInvocationExecutor.invoke() that fires on every advise / ask / do invocation. Glossary terms are given stable glossary:<id> URNs; a runtime-computed in-memory layer (no persisted YAML, no operator sync step) materializes them as real DRGNode/DRGEdge/DRGGraph objects. The chokepoint tokenizes the request text, matches against the action-scoped term set using pure-Python suffix stripping, classifies conflicts via the existing SemanticConflict model, and returns a GlossaryObservationBundle attached to the InvocationPayload. High-severity findings surface inline to hosts; low/medium findings go to the JSONL trail only. Three work packages: (1) DRG node model + index builder, (2) chokepoint + executor integration + ADR-5 benchmarks, (3) severity routing + trail writes + host doc updates.


Technical Context

Language/Version: Python 3.11+ Primary Dependencies: doctrine.drg.models (DRGNode/DRGEdge/DRGGraph/NodeKind/Relation, existing), specify_cli.glossary. (GlossaryStore/TermSense/SemanticConflict, existing), specify_cli.invocation. (executor/writer, existing), hashlib + time (stdlib only — no new external dependencies) Storage: No new storage. Chokepoint observations appended as third JSONL event to existing per-invocation files at .kittify/events/profile-invocations/{invocation_id}.jsonl Testing: pytest + mypy --strict. ≥90% line coverage on new modules. Existing invocation e2e suite must pass unchanged. Target Platform: Linux / macOS (existing spec-kitty targets) Performance Goals: Chokepoint p95 ≤50ms on 2,000-word request + 500-term index; ≤2ms on ≤50-word request; index load ≤20ms Constraints: No LLM, no HTTP, no subprocess in hot path. Chokepoint failure never blocks invocation. mark_loaded=False invariant in build_charter_context() preserved. All new slots in InvocationPayload appear in to_dict() output.


Charter Check

Charter policies applicable to this plan:

PolicyAppliesDisposition
typer CLI frameworkNo — no new CLI commands in this tranche
ruamel.yaml for YAML parsingNo — no new YAML reads/writes
pytest + 90% coverageYesAll new modules get unit tests to ≥90% line coverage
mypy --strictYesAll new .py files must pass with zero errors
Integration tests for CLI commandsYes — executor is exercised by advise/ask/doWP02 adds integration test; WP03 verifies e2e suite
DIRECTIVE_003 (Decision Documentation)Yes — ADR-5 captures the p95 threshold decisionADR-5 drafted in WP02
DIRECTIVE_010 (Specification Fidelity)YesImplementation traced against spec FRs in WP reviews

No violations to justify.


Project Structure

Documentation

kitty-specs/glossary-drg-chokepoint-01KPTE0P/
├── spec.md           # Approved specification
├── plan.md           # This file
├── research.md       # Phase 0 decisions (DRG model, URN derivation, lemmatization)
├── data-model.md     # Phase 1 data model (GlossaryObservationBundle, GlossaryChokepoint, etc.)
└── tasks.md          # Generated by /spec-kitty.tasks (not yet created)

Source Code

src/
├── doctrine/
│   └── drg/
│       └── models.py                          # MODIFY: add NodeKind.GLOSSARY = "glossary"
├── specify_cli/
│   ├── glossary/
│   │   ├── drg_builder.py                     # NEW: GlossaryDRGBuilder, build_glossary_drg_layer(), glossary_urn()
│   │   └── chokepoint.py                      # NEW: GlossaryChokepoint, GlossaryObservationBundle
│   └── invocation/
│       ├── executor.py                        # MODIFY: add chokepoint call, InvocationPayload.glossary_observations slot
│       └── writer.py                          # MODIFY: add write_glossary_observation() method
└── architecture/
    └── adrs/
        └── 2026-04-22-5-glossary-chokepoint-p95-measurement.md   # NEW (ADR-5, drafted in WP02)

tests/
├── doctrine/
│   └── drg/
│       └── test_glossary_node_kind.py         # NEW: NodeKind.GLOSSARY backward-compat tests
├── specify_cli/
│   ├── glossary/
│   │   ├── test_drg_builder.py                # NEW: GlossaryDRGBuilder unit tests
│   │   └── test_chokepoint.py                 # NEW: GlossaryChokepoint unit tests (including failure mode)
│   └── invocation/
│       └── test_executor_glossary.py          # NEW: executor integration tests (chokepoint wiring)
└── (existing invocation e2e suite)            # Must pass unchanged after WP03

Work Package Definitions

WP01 — DRG Term Node Model and Index Builder

Goal: Establish the addressable glossary:<id> URN scheme and the in-memory builder that materializes glossary nodes/edges from the active GlossaryStore.

Scope: 1. Add NodeKind.GLOSSARY = "glossary" to src/doctrine/drg/models.py. Verify existing DRG YAML files load and validate without the new node kind present (backward-compat). 2. Implement glossary_urn(surface_text: str) -> str in src/specify_cli/glossary/drg_builder.py. Hash: hashlib.sha256(surface_text.encode()).hexdigest()[:8]. Log a warning on collision detection. 3. Implement build_glossary_drg_layer(store: GlossaryStore, applicable_scopes: set[GlossaryScope]) -> DRGGraph in drg_builder.py. Uses load_validated_graph() to get action node URNs from the shipped DRG; mints one DRGNode(kind=NodeKind.GLOSSARY) per active sense in applicable scopes; adds one DRGEdge(relation=Relation.VOCABULARY) per (action_urn → glossary_urn) pair. generated_by = "glossary-drg-builder-v1". 4. Implement build_index(store: GlossaryStore, applicable_scopes: set[GlossaryScope]) -> GlossaryTermIndex in drg_builder.py. Builds the surface_to_urn and surface_to_senses dicts from active senses. Also populates lemmatized-form aliases (suffix stripping) into surface_to_urn pointing to the canonical URN. 5. Implement the suffix-stripping normalizer as a module-level function _normalize(token: str) -> str in drg_builder.py. Rules: lowercase, strip punctuation, apply suffix list (-s, -es, -ing, -ed, -er, -ers, -tion, -tions, -ment, -ments) using regex; minimum stem length 3. 6. Unit tests in tests/doctrine/drg/test_glossary_node_kind.py: backward-compat DRG load, URN prefix validation, kind value string check. 7. Unit tests in tests/specify_cli/glossary/test_drg_builder.py: URN derivation (known inputs → known hashes), collision detection/warning, builder output shape (node count, edge count), index lookup, normalizer edge cases. 8. mypy --strict and ruff check pass on all new/modified files.

Acceptance criteria for WP01:

  • NodeKind.GLOSSARY.value == "glossary"
  • glossary_urn("lane") == "glossary:d93244e7" (verified: sha256("lane".encode()).hexdigest()[:8]) ✓
  • A DRGGraph returned by build_glossary_drg_layer contains one DRGNode per unique canonical surface in applicable_scopes, with kind = NodeKind.GLOSSARY
  • VOCABULARY edges exist from each shipped action node URN to each term node ✓
  • An existing graph.yaml without glossary nodes loads via load_graph() without error ✓
  • build_index() returns an index whose surface_to_urn contains both canonical and lemmatized forms ✓
  • All new tests pass; mypy --strict reports zero errors ✓

No-touch boundary: GlossaryStore, TermSense, SemanticConflict, or any existing glossary module.


WP02 — GlossaryChokepoint and GlossaryObservationBundle

Depends on: WP01 (GlossaryTermIndex, build_index())

Goal: Implement GlossaryObservationBundle, GlossaryChokepoint, latency benchmarks, and ADR-5. No executor wiring in this WP — all executor.py and writer.py changes are in WP03.

Scope: 1. Implement GlossaryObservationBundle as a frozen dataclass in src/specify_cli/glossary/chokepoint.py. Fields: matched_urns: tuple[str, ...], high_severity: tuple[SemanticConflict, ...], all_conflicts: tuple[SemanticConflict, ...], tokens_checked: int, duration_ms: float, error_msg: str | None. Implement to_dict() -> dict[str, object] (JSONL-serializable form). 2. Implement GlossaryChokepoint in chokepoint.py. __init__(repo_root, applicable_scopes) — lazy, no I/O. run(request_text, invocation_id) -> GlossaryObservationBundle — tokenize, normalize, lookup, classify, return bundle. All exceptions caught inside run(). 3. Tokenizer in run(): split on whitespace and punctuation, lowercase, apply _normalize() from drg_builder.py, filter COMMON_WORDS from extraction.py, lookup in index.surface_to_urn. 4. Conflict classification: reuse the existing classifiers in specify_cli.glossary.conflict — do not implement a new one. For each matched term: construct ExtractedTerm(surface=surface, confidence=1.0, source="request_text"); call classify_conflict(term, senses); call score_severity(conflict_type, confidence=1.0, is_critical_step=False); call create_conflict(term, conflict_type, severity, candidate_senses, context="request_text"). Skip (no conflict added) when classify_conflict() returns None (single unambiguous active sense). URN is still added to matched_urns when the token matches, even with no conflict. 5. Benchmark: write tests/specify_cli/glossary/bench_chokepoint.py. Inputs: 500-term index, texts of 50/500/2000 words, 1000 iterations each. Record p95 per text size. 6. Draft architecture/adrs/2026-04-22-5-glossary-chokepoint-p95-measurement.md with benchmark results. 7. Unit tests in tests/specify_cli/glossary/test_chokepoint.py. Integration tests for the invoke() + payload wiring belong in WP03's test_executor_glossary.py.

Acceptance criteria for WP02:

  • GlossaryChokepoint.run() returns GlossaryObservationBundle with error_msg=None on happy path ✓
  • Conflict classification uses classify_conflict() / score_severity() / create_conflict() from specify_cli.glossary.conflict — no parallel severity model ✓
  • Exception injection returns error-bundle without propagating ✓
  • Benchmark p95 ≤50ms confirmed (or ADR-5 documents revised threshold) ✓
  • All new tests pass; mypy --strict and ruff check zero errors ✓

WP03 — Severity Routing, Trail Integration, and Host Guidance

Depends on: WP02 (GlossaryObservationBundle, InvocationPayload.glossary_observations)

Goal: Wire the observation bundle into the invocation trail, enforce the severity routing contract in code and docs, and verify the full e2e suite.

Scope: 1. Add write_glossary_observation(self, invocation_id: str, bundle: GlossaryObservationBundle) -> None to InvocationWriter in src/specify_cli/invocation/writer.py. Appends bundle.to_dict() as a JSON line to .kittify/events/profile-invocations/{invocation_id}.jsonl. Best-effort: all exceptions silently suppressed (consistent with _append_to_index()). Skip entirely for clean invocations: write only if bundle.all_conflicts or bundle.error_msg is not None. (The guard in code: if not bundle.all_conflicts and bundle.error_msg is None: return.) 2. Wire write_glossary_observation() into ProfileInvocationExecutor.invoke() immediately after write_started() (step 4 → step 5 in the sequence from data-model.md). 3. Identify existing Codex host guidance doc path (search for codex in docs/ or .agents/skills/). Update the relevant section to document:

4. Identify existing gstack host guidance doc path. Apply the same updates. 5. Update docs/trail-model.md to document the new "glossary_checked" event type under the Tier 1 section (it is appended to the same file as the started event, so it is effectively Tier 1 content). 6. Run the full existing invocation e2e test suite (tests/merge/test_profile_charter_e2e.py and any other invocation e2e tests). Confirm all pass without modification. 7. Add one e2e-style test in test_executor_glossary.py (or a new file) that exercises the full invoke()write_glossary_observation() path and verifies the JSONL file for an invocation contains three lines: started, glossary_checked, and (after complete_invocation()) completed.

  • The glossary_observations field in InvocationPayload dict
  • The high_severity rendering contract: prepend inline text before governance context
  • Suggested inline format (from data-model.md)
  • What to do when error_msg is set (log warning, do not block)

Acceptance criteria for WP03:

  • A spec-kitty advise call on a project with active glossary terms produces a .jsonl invocation file with a "glossary_checked" event ✓
  • A clean invocation (no conflicts, no errors) produces NO "glossary_checked" event in the trail (to avoid noise) ✓
  • The Codex and gstack host guidance docs contain the glossary_observations field description and rendering contract ✓
  • docs/trail-model.md documents the "glossary_checked" event type ✓
  • Existing invocation e2e tests pass unchanged ✓
  • mypy --strict and ruff check pass on all modified files ✓

Risk Register

RiskLikelihoodImpactMitigation
p95 benchmark exceeds 50ms on 2,000-word + 500-term inputsLow-mediumMediumWP02 benchmarks early; ADR-5 documents revised threshold if needed; worst-case is raising the threshold, not blocking the feature
NodeKind.GLOSSARY = "glossary" naming breaks StrEnum conventionLowLowConvention check only — no runtime impact. The value "glossary" is unique and unambiguous
Collision in glossary:<id> URN derivationVery lowLowWP01 includes collision detection with logged warning; fallback is retaining the first term (predictable behavior)
Chokepoint adds unexpected import cycle between glossary and invocation packagesLowMediumUse lazy import inside invoke() method body; WP02 verifies with ruff check (which catches circular imports in most cases)
write_glossary_observation() silently fails on filesystem errorAcceptableLowTrail write is best-effort (matches existing _append_to_index() pattern); invocation succeeds regardless
Existing e2e tests break because InvocationPayload.to_dict() now includes glossary_observationsLowLowTests that assert exact dict contents will need to add the new key; tests that use dict.get() or only check specific keys are unaffected. WP03 discovers and fixes these

Branch Contract (final)

Current branch at plan start: main Planning/base branch: main Final merge target: main Strategy: Three sequential work packages on main (or in a feature branch if the implementer prefers a PR-per-WP workflow). Each WP must pass mypy --strict, ruff check, and pytest before the next WP begins.