Model-First Schema Generation
| Field | Value |
|---|---|
| Date | 2026-03-29 |
| Scope | Pydantic models as single source of truth for all 10 doctrine YAML schemas |
| Related PRs | feature/agent-profile-implementation branch |
| Test result | 1067 passed, 0 failures |
Purpose
Establish a model-first architecture where Pydantic models are the single source of truth for every doctrine YAML schema. Before this work, 6 of 10 schemas were hand-written YAML; the remaining 4 (mission, import-candidate, model-to-task_type, agent-profile) had no Pydantic models at all. Drift between models and schemas was already observable in the original 6.
Architecture
Pydantic model ──► generate_schemas.py ──► *.schema.yaml
(source of truth) (post-processor) (derived artifact)
scripts/generate_schemas.py calls model_json_schema() on each registered
model, then applies a deterministic post-processing pipeline:
- Inline enum refs — StrEnum
$refentries are replaced with inlinetype: string, enum: [...].ArtifactKindgets a restricted subset; all other enums are inlined verbatim. - Rename definitions —
$defs/PascalCase→definitions/snake_case. - Clean Pydantic artifacts — remove auto-generated
titlemetadata, simplifyanyOf: [{type: X}, {type: null}]to plaintype: X, stripdefault: []/default: null. - Add minLength — required string fields without a
patternconstraint getminLength: 1. - Add metadata —
$schema,$id,title,description. - Per-type fixups — conditional rules (
allOf), description annotations, format hints,oneOfconversions. - Key ordering — deterministic property ordering for clean diffs.
CI integration
python scripts/generate_schemas.py --check # exits 1 if any schema is stale
python scripts/generate_schemas.py # regenerate all schemas
python scripts/generate_schemas.py mission # regenerate a single schema
Phase 1: Fix Doctrine Test Failures
Starting point: 13 test failures across tactics, procedures, styleguides, agent profiles, and cycle detection.
All 13 failures were resolved by correcting schema-model drift:
- Tactic fixtures with extra properties that violated
additionalProperties: false— folded content into step descriptions or a newnotesfield. - Procedure model missing
anti_patterns,notes, andreasononProcedureReference. - Styleguide/directive models missing
id/schema_versionpattern constraints. - Cycle detection tests needed updated after cross-reference field additions.
Result: 1065 tests passing, 0 failures.
Phase 2: Schema Generation for Original 6 Schemas
Created scripts/generate_schemas.py and Pydantic model updates:
| Schema | Model location | Key changes |
|---|---|---|
| paradigm | doctrine.paradigms.models |
Added tactic_refs, directive_refs, opposed_by |
| tactic | doctrine.tactics.models |
Added notes, opposed_by; pattern constraints on id |
| directive | doctrine.directives.models |
Added opposed_by; pattern on id/schema_version |
| procedure | doctrine.procedures.models |
Added ProcedureAntiPattern, anti_patterns, notes |
| styleguide | doctrine.styleguides.models |
Pattern constraints on id/schema_version |
| toolguide | doctrine.toolguides.models |
extra='forbid' added |
A shared Contradiction model was created in doctrine.shared.models and
imported by paradigm, tactic, and directive models to represent
opposed_by entries.
All models received extra='forbid' (matching additionalProperties: false
in schemas).
Phase 3: Remaining 4 Schemas
Four schemas had no Pydantic models. Each required a different generation strategy due to schema complexity:
Mission (doctrine.missions.models)
Straightforward model with one complication: the states array accepts both
bare strings and objects ({id, agent-profile}). Pydantic emits anyOf for
this union; the hand-written schema uses oneOf. A fixup converts
anyOf → oneOf in the generated schema.
The MissionTransition.from field uses alias="from" since from is a
Python keyword. The by_alias=True flag ensures the schema emits from
instead of from_state.
Import Candidate (doctrine.import_candidates.models)
The most complex schema — uses a top-level oneOf with two independent
object schemas:
- LegacyImportCandidate (WP01 baseline):
schema_version,idwith uppercase pattern,source.title + source.reference,target, status enum[proposed, accepted, rejected]. - CurationImportCandidate (WP03 scaffold): richer
sourcewithtype/publisher/url/accessed_on,classification.target_concepts,adaptation, status enum[proposed, reviewing, adopted, rejected], plus conditional: ifstatus == adoptedthenresulting_artifactsis required withminItems: 1.
A top-level oneOf cannot be expressed by a single Pydantic model, so
generate_schemas.py has a register_custom() path: it generates each
variant independently, fully inlines all $ref definitions (each variant
must be self-contained), and wraps them in the outer oneOf.
Model-to-Task Type (doctrine.model_task_routing.models)
The largest schema (251 lines) with 17 $defs entries including 7 StrEnums.
The standard pipeline handles this with _inline_all_enum_refs() — a
generalisation of the ArtifactKind-only inliner that replaces all StrEnum
$ref entries. Format annotations (date-time, uri) and default: USD
are added in the fixup phase.
Agent Profile (doctrine.agent_profiles.schema_models)
A new schema_models.py was created separate from the domain model in
profile.py. The two models serve different purposes:
| Concern | profile.py (domain) |
schema_models.py (schema) |
|---|---|---|
| Purpose | Runtime object hydration, weighted matching, inheritance | Schema generation |
| Draft | N/A (Pydantic only) | Draft 2020-12 (was Draft-07) |
| Extra fields | sentinel, excluding |
tactic-references, toolguide-references, styleguide-references, self-review-protocol |
| Role field | Role StrEnum + BeforeValidator coercion |
Plain str |
| Context sources | 3 fields | 6 fields (adds tactics, toolguides, styleguides) |
The schema model uses by_alias=True to emit kebab-case property names
matching the YAML convention (profile-id, context-sources, etc.).
New Infrastructure in generate_schemas.py
| Feature | Purpose |
|---|---|
by_alias flag |
Emit alias names in schema (mission, agent-profile) |
register_custom() |
Fully custom generators for non-standard schemas |
_inline_all_enum_refs() |
Inline all StrEnum $ref entries, not just ArtifactKind |
_inline_all_refs() |
Fully inline all $ref for self-contained oneOf variants |
_deep_order_variant() |
Key ordering inside oneOf entries |
DDD Paradigm Artifact
As part of Phase 2, a Domain-Driven Design paradigm was authored:
- File:
src/doctrine/paradigms/shipped/domain-driven-design.paradigm.yaml - Coverage: Strategic DDD (Bounded Contexts, Context Mapping, Ubiquitous Language) and Tactical DDD (Aggregates, Entities, Value Objects, Domain Events, Repositories, Services).
- References: 7 existing tactics, 3 existing directives.
- Opposed by: Big Ball of Mud, CRUD-Everywhere, Anemic Domain Model.
Gaps (10+ missing tactics) are tracked in work/ddd-tactic-gaps.md.
Final State
| Metric | Value |
|---|---|
| Total schemas | 10 (all generated from models) |
| Hand-written schemas | 0 |
| Tests passing | 1067 |
| Tests failing | 0 |
--check mode |
All 10 OK |