Implementation Plan: Model-Discipline Dispatch Binding
Branch: design/model-discipline-dispatch-2364 | Date: 2026-07-04 | Spec: spec.md (rev 2, full-evaluator scope) Input: kitty-specs/model-discipline-dispatch-binding-01KWPW36/spec.md
Summary
Bind the authored-but-dead model-discipline rule at the governed dispatch seam. Build the full routing machinery: a catalog loader (activation→YAML→ModelToTaskType→freshness), an action→task_type mapping, a deterministic objective-function evaluator (task_fit × weights under objective, with quality_first as the capability lever, applying tier_constraints + override_policy precedence), surfaced as an advisory recommendation on the InvocationPayload returned by ProfileInvocationExecutor.invoke(). Add an optional per-profile model/effort field on the profile.py domain model (schema regenerated from schema_models.py) as the override anchor. Make the charter references real doctrine: a new model-task-routing.tactic.yaml + a directive suggests edge + repoint the snake_case charter token; a one-line suggests edge for the already-existing autonomous-operation-protocol tactic. Populate a starter catalog. Install a refactor-stable all-sections no-dangling-reference invariant. Advisory-only (the seam never spawns a model); interactive delegation + gated/required modes are out of scope.
Technical Context
Language/Version: Python 3.11+ Primary Dependencies: pydantic (ModelToTaskType, AgentProfile), ruamel.yaml, existing specify_cli.invocation (executor/router/registry) + charter/doctrine packages, DRG (graph.yaml) + reference_resolver Storage: Filesystem — catalog YAML shipped as package data at src/doctrine/model_task_routing/catalog/model-to-task_type.yaml (loader resolves via importlib.resources); agent-profile.schema.yaml (generated); graph.yaml; references.yaml (generated bundle) Testing: pytest; red-first through spec-kitty dispatch / ProfileInvocationExecutor.invoke() payload contents (NOT the Pydantic model in isolation, NOT a stubbed recommendation). Evaluator unit-tested pure. Vary-with-catalog + load-via-DRG anti-fake assertions Target Platform: CLI (spec-kitty dispatch) + library (invoke()) Project Type: single (Python package) Performance Goals: N/A — one small catalog read + a bounded scorer per dispatch; advisory, off the hot render path Constraints: advisory-only (executor.py:150,181 — no model spawn); DRG-driven reference resolution (C-002); generated schema — regenerate schema_models.py, never hand-edit the .yaml (C-005); override_policy: advisory only (C-004); boundary vs #1049 (C-003) Scale/Scope: ~5-6 WPs; surfaces: invocation/{executor,router}.py, a new loader/evaluator module under model_task_routing/, profile.py + schema_models.py, charter.md + graph.yaml + a tactic file + a catalog instance, test_no_dead_modules.py allowlist, + tests
Charter Check
GATE: pass before Phase 0; re-check after design.
- Single canonical authority ✅ — one loader + one evaluator; the catalog becomes the single routing source. No parallel routing path.
- Architectural alignment / layering ✅ — loader/evaluator live with the catalog model (
src/doctrine/model_task_routing/); the dispatch seam (specify_cli.invocation) consumes them; charter references resolve via the DRG, not manual edits. Honors kernel←doctrine←charter←specify_cli. - ATDD-first / red-first ✅ — NFR-001 mandates red-first through
dispatch/invoke(); SC-001 pins the vary-with-catalog derivation check (anti-fake); NFR-004 pins evaluator determinism. - No new shadow paths ✅ — this adds the missing consumer to an existing schema; it does not create a second routing surface (#1049 stays a separate track).
- Catfooding ✅ — converts the dangling charter rule into loadable activated doctrine (the tactic + DRG edge), catfooding the doctrine system.
- Terminology ✅ — no legacy terms; run the terminology guard (charter/doctrine prose touched).
No violations → Complexity Tracking omitted.
Project Structure
src/doctrine/model_task_routing/
├── models.py # existing Pydantic model (remove from dead-module allowlist as loader lands)
├── loader.py # NEW — FR-001: activation→path→YAML→validate→freshness
└── evaluator.py # NEW — FR-003: objective-function scorer + override_policy precedence
src/doctrine/
├── tactics/built-in/model-task-routing.tactic.yaml # NEW — FR-006
└── graph.yaml # + tactic node + 2 directive→tactic suggests edges (FR-006/007)
src/specify_cli/invocation/
├── executor.py # FR-004: InvocationPayload recommendation slot + to_dict + wire in invoke()
└── task_class_map.py # NEW — FR-002: action/verb → catalog task_type
src/specify_cli/cli/commands/dispatch.py # FR-004: _render_rich_payload recommendation line
src/doctrine/agent_profiles/
├── profile.py # FR-005: optional model/effort field (kebab alias) on AgentProfile
└── schema_models.py # FR-005: schema source → regenerate agent-profile.schema.yaml
.kittify/charter/charter.md # FR-006: repoint model_task_routing → model-task-routing token
src/doctrine/model_task_routing/catalog/model-to-task_type.yaml # FR-008: populated catalog instance, shipped as package data at src/doctrine/model_task_routing/catalog/model-to-task_type.yaml (loader resolves via importlib.resources)
tests/
├── doctrine/test_model_task_routing_loader.py # FR-001
├── doctrine/test_model_task_routing_evaluator.py # FR-003/NFR-004 (pure scorer)
├── invocation/test_dispatch_recommendation.py # FR-004/SC-001 (red-first through dispatch, vary-with-catalog)
├── doctrine/test_task_class_map.py # FR-002
├── doctrine/test_agent_profile_model_field.py # FR-005/NFR-003
├── charter/test_model_task_routing_resolves.py # FR-006/007/SC-003 (load-via-DRG)
└── architectural/test_charter_references_resolve.py # FR-009 (all-sections invariant)
Structure Decision: single Python package. Loader + evaluator co-locate with the catalog model under src/doctrine/model_task_routing/ (doctrine layer owns routing data + logic); the dispatch seam consumes them. The no-dangling invariant is a behavioral architectural test (refactor-stable — parses → token prose, asserts references.yaml resolution).
Implementation Concern Map
> Concerns, not work packages. /spec-kitty.tasks translates these into WPs (expect ~5-6).
IC-01 — Catalog loader + task-class mapping
- Purpose: turn the dead catalog into a loadable, freshness-checked runtime source, and bridge dispatch verbs to catalog task_type vocabulary.
- Requirements: FR-001, FR-002; C-002, C-004.
- Surfaces:
model_task_routing/loader.py(new),invocation/task_class_map.py(new),test_no_dead_modules.py:251allowlist removal. - Depends-on: none (foundation).
- Risks: catalog path resolution from activation config; the action→task_type map is a live maintenance seam (must stay in sync with
DEFAULT_ROLE_CAPABILITIESverbs + the catalog vocabulary). Whole-file-invalid catalog → treated absent (non-fatal).
IC-02 — Routing evaluator (objective-function scorer)
- Purpose: deterministic recommendation from
task_fit×weightsunderobjective(quality_first = capability), applyingtier_constraints+override_policyprecedence. - Requirements: FR-003, NFR-004; C-004.
- Surfaces:
model_task_routing/evaluator.py(new), pure/stub-testable. - Depends-on: IC-01 (loaded catalog + task_type) + IC-04 (profile field for override precedence).
- Risks: precedence semantics under
advisorymust be deterministic (emit both catalog + profile candidates with provenance, enforce neither). Capability expressed via the quality objective — verify the scorer actually ranks strongest-fit for high-judgment task types.
IC-03 — Advisory payload wiring at the dispatch seam
- Purpose: surface the recommendation on
InvocationPayload, advisory + non-fatal, throughinvoke(). - Requirements: FR-004; NFR-001, NFR-002; C-001.
- Surfaces:
executor.py(__slots__slot +to_dict()+ wire ininvoke()at ~:196/205/273),cli/commands/dispatch.py(_render_rich_payload). - Depends-on: IC-01, IC-02.
- Risks: must be non-fatal (missing/stale catalog → absent recommendation, dispatch succeeds); red-first must go through
dispatch/invoke()payload (not the model in isolation). Runtime import discipline.
IC-04 — Agent-profile model/effort field (override anchor)
- Purpose: optional per-profile tier declaration that reaches the evaluator.
- Requirements: FR-005; NFR-003; C-005.
- Surfaces:
profile.pyAgentProfile(preferred_model/effort, kebab-alias field, load-bearing),schema_models.py(schema source) + regenerateagent-profile.schema.yaml; consumed by the evaluator (IC-02). Note:router.pyis untouched — the recommendation is computed inexecutor.py, notrouter.py(which does agent-selection only). - Depends-on: none (can co-tenant if ownership disjoint); consumed by IC-02.
- Risks: schema-only add is a silent no-op (
extra='ignore') — MUST land onprofile.py. Watch themodel_protected namespace (alias if needed). Regenerate the schema; never hand-edit the.yaml.
IC-05 — Doctrine artifact + DRG resolution + populated catalog
- Purpose: make the charter references real activated doctrine; ship catalog data.
- Requirements: FR-006, FR-007, FR-008; C-002.
- Surfaces:
model-task-routing.tactic.yaml(new),graph.yaml(tactic node + 2 directivesuggestsedges),charter.md(token repoint), the populated catalog instance,references.yaml/bundle regeneration. - Depends-on: none for authoring; SC-003 verified once edges land.
- Risks: resolution is DRG-driven — a references.yaml row without a
suggestsedge still dangles. The_LIBRARY/synthesis-manifest bundle regen is a generated-lockfile step that reds CI if skipped. Kind = tactic (confirmed). Repoint the token (cleaner than minting a snake_case artifact).
IC-06 — No-dangling-reference invariant
- Purpose: durable guard so section-level dangling references cannot recur.
- Requirements: FR-009; SC-004.
- Surfaces:
tests/architectural/test_charter_references_resolve.py(new). - Depends-on: IC-05 (goes green once the tokens resolve; red on pre-fix HEAD).
- Risks: must be refactor-stable — parse
→ \token\`` across ALL sections (exclude non-backticked prose), resolve to references.yaml id-suffix; no literal token list pinned.