Spec Kitty

└─ kitty-specs
   └─ Glossary Semantic Integrity Runtime for Mission Framework

Mission Run:

📚 Docs ↗

Data Model: Glossary Semantic Integrity Runtime

Feature: 041-mission-glossary-semantic-integrity Date: 2026-02-16 Status: Complete

Overview

This document defines the core entities, value objects, and relationships for the glossary semantic integrity runtime. All entities follow event sourcing principles (Feature 007) with append-only state changes.

Core Entities

TermSurface

Raw string representation of a term as observed in mission inputs/outputs.

Attributes:

surface_text (str): The actual text (e.g., "workspace", "mission", "WP")

Invariants:

Normalized form: lowercase, trimmed whitespace
Stem-light applied: "workspaces" → "workspace" (plural → singular)
Unique per scope (no duplicate surfaces in active glossary)

Example:

TermSurface(surface_text="workspace")

TermSense

Meaning of a TermSurface within a specific GlossaryScope.

Attributes:

surface (TermSurface): The term this sense defines
scope (GlossaryScope): Which scope this sense belongs to
definition (str): Human-readable meaning
provenance (Provenance): Who/when/why this sense was created
confidence (float): Confidence score 0.0-1.0 (extraction quality)
status (SenseStatus): active | deprecated | draft

Invariants:

One sense can exist per (surface, scope) pair
Deprecated senses remain in history but not in active resolution
Draft senses auto-created by extraction (low confidence) until promoted

Provenance fields:

actor_id (str): User ID or LLM actor who created this sense
timestamp (datetime): When created
source (str): Where it came from (e.g., "user_clarification", "metadata_hint", "auto_extraction")

Example:

TermSense(
    surface=TermSurface("workspace"),
    scope=GlossaryScope.TEAM_DOMAIN,
    definition="Git worktree directory for a work package",
    provenance=Provenance(
        actor_id="user:alice",
        timestamp=datetime(2026, 2, 16, 12, 0, 0),
        source="user_clarification"
    ),
    confidence=1.0,
    status=SenseStatus.ACTIVE
)

GlossaryScope

Enumeration of scope levels in the glossary hierarchy.

Values:

mission_local: Mission-specific temporary/working semantics (highest precedence)
team_domain: Language used by mission participants/contributors
audience_domain: Language for intended recipients (customers/stakeholders/users)
spec_kitty_core: Spec Kitty canonical terms (lowest precedence)

Resolution order: mission_local → team_domain → audience_domain → spec_kitty_core

Scope activation:

Mission start emits GlossaryScopeActivated for each active scope
Scopes without seed files are skipped cleanly (no error)

Example:

# Scope resolution for term "workspace"
# 1. Check mission_local (no match)
# 2. Check team_domain (2 matches - ambiguous!)
# 3. (Stop - conflict detected, clarification required)

SemanticConflict

Classification of a term conflict detected during semantic check.

Attributes:

term (TermSurface): The conflicting term
conflict_type (ConflictType): Type of conflict (see below)
severity (Severity): low | medium | high
confidence (float): Confidence in conflict detection (0.0-1.0)
candidate_senses (List[TermSense]): Possible meanings from scope resolution
context (str): Usage location (e.g., "step input: description field")

ConflictType values:

UNKNOWN: Term not found in any scope (no match in scope stack)
AMBIGUOUS: Multiple active senses in current scope stack, usage unqualified
INCONSISTENT: LLM output uses sense contradicting active glossary
UNRESOLVED_CRITICAL: Unknown/new critical term with low confidence, no resolved sense before generation

Severity scoring:

High: Ambiguous in critical step + low confidence OR unresolved critical term
Medium: Ambiguous in non-critical step OR unknown term with medium confidence
Low: Inconsistent usage in non-critical step OR unknown term with high confidence (likely safe)

Example:

SemanticConflict(
    term=TermSurface("workspace"),
    conflict_type=ConflictType.AMBIGUOUS,
    severity=Severity.HIGH,
    confidence=0.9,
    candidate_senses=[
        TermSense(..., definition="Git worktree directory"),
        TermSense(..., definition="VS Code workspace file")
    ],
    context="step input: requirements field"
)

StepCheckpoint

Minimal state for resuming step execution after conflict resolution.

Attributes:

mission_id (str): Which mission
run_id (str): Which run instance
step_id (str): Which step
strictness (Strictness): Resolved strictness mode (off/medium/max)
scope_refs (List[ScopeRef]): Active glossary scope versions
input_hash (str): SHA256 of step inputs (detect context changes)
cursor (str): Execution stage (e.g., "pre_generation_gate")
retry_token (str): Unique token for this checkpoint (UUID)
timestamp (datetime): When checkpoint created

ScopeRef structure:

scope (GlossaryScope): Which scope
version_id (str): Glossary version ID (for deterministic replay)

Usage:

1. Emit StepCheckpointed before generation gate 2. User resolves conflict 3. Resume loads checkpoint, verifies input_hash 4. If hash matches: resume from cursor 5. If hash differs: prompt for confirmation

Example:

StepCheckpoint(
    mission_id="041-mission",
    run_id="run-2026-02-16-001",
    step_id="step-specify-001",
    strictness=Strictness.MEDIUM,
    scope_refs=[
        ScopeRef(scope=GlossaryScope.MISSION_LOCAL, version_id="v1"),
        ScopeRef(scope=GlossaryScope.TEAM_DOMAIN, version_id="v3"),
    ],
    input_hash="abc123...",
    cursor="pre_generation_gate",
    retry_token="uuid-1234-5678",
    timestamp=datetime(2026, 2, 16, 12, 30, 0)
)

Middleware Components

GlossaryCandidateExtractionMiddleware

Extracts candidate terms from step inputs/outputs.

Inputs:

context (PrimitiveExecutionContext): Step inputs, metadata, config

Outputs:

context (modified): Adds extracted_terms field
Events: TermCandidateObserved (for each extracted term)

Extraction logic (see research.md Finding 3):

1. Load metadata hints (glossary_watch_terms, glossary_aliases, etc.) 2. Apply deterministic heuristics (quoted phrases, acronyms, casing, repeats) 3. Normalize (lowercase, trim, stem-light) 4. Score confidence (metadata > pattern > weak heuristic)

Example:

# Input context
context.inputs = {"description": "The workspace contains implementation files"}

# After extraction
context.extracted_terms = [
    ExtractedTerm(
        surface=TermSurface("workspace"),
        confidence=0.8,
        source="casing_pattern",
        context="description field"
    )
]

SemanticCheckMiddleware

Resolves extracted terms against scope hierarchy, detects conflicts.

Inputs:

context.extracted_terms (from extraction middleware)
Active glossary scopes (loaded from seed files + event log)

Outputs:

context.conflicts (List[SemanticConflict]): Detected conflicts
Events: SemanticCheckEvaluated (with findings, severity, recommended action)

Resolution logic:

1. For each extracted term:

2. Score severity based on step criticality + confidence 3. Emit SemanticCheckEvaluated with overall severity and recommended action

Resolve against scope order (mission_local → team_domain → audience_domain → spec_kitty_core)
If no match: conflict type = UNKNOWN
If 1 match: resolved (no conflict)
If 2+ matches: conflict type = AMBIGUOUS
If LLM output contradicts glossary: conflict type = INCONSISTENT

Example:

# Input
context.extracted_terms = [TermSurface("workspace")]

# After check
context.conflicts = [
    SemanticConflict(
        term=TermSurface("workspace"),
        conflict_type=ConflictType.AMBIGUOUS,
        severity=Severity.HIGH,
        candidate_senses=[...],
        context="description field"
    )
]

GenerationGateMiddleware

Blocks LLM generation on unresolved high-severity conflicts.

Inputs:

context.conflicts (from semantic check middleware)
context.strictness (resolved strictness mode)

Outputs:

If pass: continue to next middleware
If block: raise BlockedByConflict exception
Events: GenerationBlockedBySemanticConflict (if blocked)

Gate logic:

Strictness = off: always pass (no blocking)
Strictness = medium: block only if high-severity conflicts exist
Strictness = max: block if any unresolved conflicts exist

Example:

# Input
context.strictness = Strictness.MEDIUM
context.conflicts = [SemanticConflict(severity=Severity.HIGH, ...)]

# Output
raise BlockedByConflict(conflicts=context.conflicts)
# Emit: GenerationBlockedBySemanticConflict

ClarificationMiddleware

Renders ranked candidate senses, prompts user for resolution.

Inputs:

context.conflicts (from generation gate)
Interactive mode flag (CLI vs non-interactive)

Outputs:

User selection or async defer
Events: GlossaryClarificationRequested, GlossaryClarificationResolved, GlossarySenseUpdated

Clarification logic:

1. Sort conflicts by severity (high → medium → low), cap to 3 2. Render each conflict with Rich:

3. Prompt with typer.prompt():

4. Handle choice:

Term, context, scope, ranked candidate senses (by confidence)
Select candidate (1..N)
Custom sense (C)
Defer to async (D)
Candidate selected: emit GlossaryClarificationResolved, update glossary
Custom sense: emit GlossarySenseUpdated, add to glossary
Defer: emit GlossaryClarificationRequested, exit with blocked status

Non-interactive mode:

Auto-defer all conflicts
Emit GlossaryClarificationRequested for all high-severity
Exit with error code (generation still blocked)

Example:

# Interactive prompt
"""
🔴 High-severity conflict: "workspace"

Term: workspace
Context: "The workspace contains the implementation files"
Scope: team_domain (2 matches)

Candidate senses:
1. [team_domain] Git worktree directory for a work package (confidence: 0.9)
2. [team_domain] VS Code workspace configuration file (confidence: 0.7)

Select: 1-2 (candidate), C (custom sense), D (defer to async)
> 1

✅ Resolved: workspace = Git worktree directory for a work package
"""

ResumeMiddleware

Loads checkpoint from events, restores step execution context.

Inputs:

retry_token (from user retry request)
Event log (to load StepCheckpointed event)

Outputs:

Restored context (from checkpoint)
Resume from cursor (skip already-completed stages)

Resume logic:

1. Load latest StepCheckpointed event for this step_id 2. Verify input_hash matches current inputs

3. Load updated glossary state from GlossarySenseUpdated events 4. Resume from cursor ("pre_generation_gate") 5. Re-run generation gate with updated state

If changed: prompt user for confirmation
If unchanged: restore context

Example:

# Load checkpoint
checkpoint = load_checkpoint(step_id="step-specify-001")

# Verify inputs unchanged
if checkpoint.input_hash != hash_inputs(context.inputs):
    if not typer.confirm("Context changed. Proceed?"):
        raise AbortResume()

# Restore context
context.strictness = checkpoint.strictness
context.scope_refs = checkpoint.scope_refs

# Resume from cursor
if checkpoint.cursor == "pre_generation_gate":
    # Skip extraction and semantic check (already done)
    # Re-run generation gate (with updated glossary)
    run_generation_gate(context)

Relationships

TermSurface (1) ----< (N) TermSense
                         |
                         |--- GlossaryScope (1)
                         |--- Provenance (1)
                         |--- SenseStatus (enum)

SemanticConflict (1) ----< (N) TermSense (candidate_senses)
                    |
                    |--- TermSurface (1)
                    |--- ConflictType (enum)
                    |--- Severity (enum)

StepCheckpoint (1) ----< (N) ScopeRef
                   |
                   |--- Strictness (enum)

Middleware Pipeline:
  GlossaryCandidateExtractionMiddleware
    ↓ (emits TermCandidateObserved)
  SemanticCheckMiddleware
    ↓ (emits SemanticCheckEvaluated)
  GenerationGateMiddleware
    ↓ (emits GenerationBlockedBySemanticConflict if blocked)
  ClarificationMiddleware
    ↓ (emits GlossaryClarificationRequested/Resolved, GlossarySenseUpdated)
  ResumeMiddleware
    ↓ (loads StepCheckpointed, resumes from cursor)

State Transitions

TermSense Status

draft (auto-extracted, low confidence)
  ↓ (user clarification)
active (promoted by user selection)
  ↓ (user deprecation or newer sense)
deprecated (kept in history, not in active resolution)

Conflict Resolution Flow

Conflict detected (SemanticCheckEvaluated)
  ↓
Generation blocked (GenerationBlockedBySemanticConflict)
  ↓
Clarification requested (GlossaryClarificationRequested)
  ↓ (user resolves)
Clarification resolved (GlossaryClarificationResolved)
  ↓
Glossary updated (GlossarySenseUpdated)
  ↓
Resume from checkpoint (StepCheckpointed)
  ↓
Generation unblocked (continue execution)

Validation Rules

TermSurface

Must not be empty
Must be normalized (lowercase, trimmed)
Must be unique per scope

TermSense

Definition must not be empty
Confidence must be 0.0-1.0
Provenance must have actor_id and timestamp

SemanticConflict

Must have at least 1 candidate sense (for AMBIGUOUS type)
Severity must align with confidence (high severity → low confidence)

StepCheckpoint

mission_id, run_id, step_id must not be empty
input_hash must be valid SHA256 (64 hex chars)
retry_token must be valid UUID

Event Schema References

See contracts/events.md for canonical event schemas from Feature 007.

All events conform to spec-kitty-events package contracts. CLI imports events, not redefines them.