Model-First Schema Generation

Field	Value
Date	2026-03-29
Scope	Pydantic models as single source of truth for all 10 doctrine YAML schemas
Related PRs	feature/agent-profile-implementation branch
Test result	1067 passed, 0 failures

Purpose

Establish a model-first architecture where Pydantic models are the single source of truth for every doctrine YAML schema. Before this work, 6 of 10 schemas were hand-written YAML; the remaining 4 (mission, import-candidate, model-to-task_type, agent-profile) had no Pydantic models at all. Drift between models and schemas was already observable in the original 6.

Architecture

Pydantic model  ──►  generate_schemas.py  ──►  *.schema.yaml
  (source of truth)     (post-processor)        (derived artifact)

scripts/generate_schemas.py calls model_json_schema() on each registered model, then applies a deterministic post-processing pipeline:

Inline enum refs — StrEnum $ref entries are replaced with inline type: string, enum: [...]. ArtifactKind gets a restricted subset; all other enums are inlined verbatim.
Rename definitions — $defs/PascalCase → definitions/snake_case.
Clean Pydantic artifacts — remove auto-generated title metadata, simplify anyOf: [{type: X}, {type: null}] to plain type: X, strip default: []/default: null.
Add minLength — required string fields without a pattern constraint get minLength: 1.
Add metadata — $schema, $id, title, description.
Per-type fixups — conditional rules (allOf), description annotations, format hints, oneOf conversions.
Key ordering — deterministic property ordering for clean diffs.

CI integration

python scripts/generate_schemas.py --check   # exits 1 if any schema is stale
python scripts/generate_schemas.py           # regenerate all schemas
python scripts/generate_schemas.py mission   # regenerate a single schema

Phase 1: Fix Doctrine Test Failures

Starting point: 13 test failures across tactics, procedures, styleguides, agent profiles, and cycle detection.

All 13 failures were resolved by correcting schema-model drift:

Tactic fixtures with extra properties that violated additionalProperties: false — folded content into step descriptions or a new notes field.
Procedure model missing anti_patterns, notes, and reason on ProcedureReference.
Styleguide/directive models missing id/schema_version pattern constraints.
Cycle detection tests needed updated after cross-reference field additions.

Result: 1065 tests passing, 0 failures.

Phase 2: Schema Generation for Original 6 Schemas

Created scripts/generate_schemas.py and Pydantic model updates:

Schema	Model location	Key changes
paradigm	`doctrine.paradigms.models`	Added `tactic_refs`, `directive_refs`, `opposed_by`
tactic	`doctrine.tactics.models`	Added `notes`, `opposed_by`; pattern constraints on `id`
directive	`doctrine.directives.models`	Added `opposed_by`; pattern on `id`/`schema_version`
procedure	`doctrine.procedures.models`	Added `ProcedureAntiPattern`, `anti_patterns`, `notes`
styleguide	`doctrine.styleguides.models`	Pattern constraints on `id`/`schema_version`
toolguide	`doctrine.toolguides.models`	`extra='forbid'` added

A shared Contradiction model was created in doctrine.shared.models and imported by paradigm, tactic, and directive models to represent opposed_by entries.

All models received extra='forbid' (matching additionalProperties: false in schemas).

Phase 3: Remaining 4 Schemas

Four schemas had no Pydantic models. Each required a different generation strategy due to schema complexity:

Mission (`doctrine.missions.models`)

Straightforward model with one complication: the states array accepts both bare strings and objects ({id, agent-profile}). Pydantic emits anyOf for this union; the hand-written schema uses oneOf. A fixup converts anyOf → oneOf in the generated schema.

The MissionTransition.from field uses alias="from" since from is a Python keyword. The by_alias=True flag ensures the schema emits from instead of from_state.

Import Candidate (`doctrine.import_candidates.models`)

The most complex schema — uses a top-level oneOf with two independent object schemas:

LegacyImportCandidate (WP01 baseline): schema_version, id with uppercase pattern, source.title + source.reference, target, status enum [proposed, accepted, rejected].
CurationImportCandidate (WP03 scaffold): richer source with type/publisher/url/accessed_on, classification.target_concepts, adaptation, status enum [proposed, reviewing, adopted, rejected], plus conditional: if status == adopted then resulting_artifacts is required with minItems: 1.

A top-level oneOf cannot be expressed by a single Pydantic model, so generate_schemas.py has a register_custom() path: it generates each variant independently, fully inlines all $ref definitions (each variant must be self-contained), and wraps them in the outer oneOf.

Model-to-Task Type (`doctrine.model_task_routing.models`)

The largest schema (251 lines) with 17 $defs entries including 7 StrEnums. The standard pipeline handles this with _inline_all_enum_refs() — a generalisation of the ArtifactKind-only inliner that replaces all StrEnum $ref entries. Format annotations (date-time, uri) and default: USD are added in the fixup phase.

Agent Profile (`doctrine.agent_profiles.schema_models`)

A new schema_models.py was created separate from the domain model in profile.py. The two models serve different purposes:

Concern	`profile.py` (domain)	`schema_models.py` (schema)
Purpose	Runtime object hydration, weighted matching, inheritance	Schema generation
Draft	N/A (Pydantic only)	Draft 2020-12 (was Draft-07)
Extra fields	`sentinel`, `excluding`	`tactic-references`, `toolguide-references`, `styleguide-references`, `self-review-protocol`
Role field	`Role` StrEnum + `BeforeValidator` coercion	Plain `str`
Context sources	3 fields	6 fields (adds `tactics`, `toolguides`, `styleguides`)

The schema model uses by_alias=True to emit kebab-case property names matching the YAML convention (profile-id, context-sources, etc.).

New Infrastructure in generate_schemas.py

Feature	Purpose
`by_alias` flag	Emit alias names in schema (mission, agent-profile)
`register_custom()`	Fully custom generators for non-standard schemas
`_inline_all_enum_refs()`	Inline all StrEnum `$ref` entries, not just ArtifactKind
`_inline_all_refs()`	Fully inline all `$ref` for self-contained oneOf variants
`_deep_order_variant()`	Key ordering inside `oneOf` entries

DDD Paradigm Artifact

As part of Phase 2, a Domain-Driven Design paradigm was authored:

File: src/doctrine/paradigms/shipped/domain-driven-design.paradigm.yaml
Coverage: Strategic DDD (Bounded Contexts, Context Mapping, Ubiquitous Language) and Tactical DDD (Aggregates, Entities, Value Objects, Domain Events, Repositories, Services).
References: 7 existing tactics, 3 existing directives.
Opposed by: Big Ball of Mud, CRUD-Everywhere, Anemic Domain Model.

Gaps (10+ missing tactics) are tracked in work/ddd-tactic-gaps.md.

Final State

Metric	Value
Total schemas	10 (all generated from models)
Hand-written schemas	0
Tests passing	1067
Tests failing	0
`--check` mode	All 10 OK