WP-Prompt Governance Contract — ATDD Findings
Author: Python Pedro
Date: 2026-05-16
Test file: tests/specify_cli/next/test_wp_prompt_governance_contract.py
Related architecture review: org-doctrine-layer-architecture-review.md — sections "Process architecture", "Root cause", and "Empirical addendum".
Purpose
These ATDD failing-first acceptance tests pin the contract for what the WP implement / review prompt must contain, so that the next mission has an executable specification to make pass. The tests encode the structural fixes called out in the architecture review's empirical addendum.
The test suite has nine substantive failures and fourteen passing wirings. The split is informative and is the input to the remediation mission spec.
Test result summary (run on feat/org-doctrine-layer at 76c9f3b4)
9 failed, 14 passed, 1 warning in 6.03s
Passing (entry-point IS wired)
| Test | What it proves |
|---|---|
TestImplementPromptInvokesCharterPipeline::test_build_wp_prompt_for_implement_calls_governance_context |
The implement prompt builder DOES call _governance_context(action="implement"). |
TestImplementPromptInvokesCharterPipeline::test_build_wp_prompt_for_review_calls_governance_context |
Same for the review action. |
TestImplementPromptInvokesCharterPipeline::test_governance_context_output_is_present_in_wp_prompt_text |
Whatever _governance_context returns IS embedded in the final prompt. |
TestImplementPromptContainsActionableGovernance::test_implement_prompt_terminology_canon_body_or_fetch_with_when_doing_rule |
Passes incidentally because the fixture charter's body itself contains the phrase "canonical term for a unit of governed work is Mission". When a charter body explicitly contains the rule prose, that prose appears in the resolved context. |
TestImplementPromptContainsActionableGovernance::test_implement_prompt_is_not_only_section_anchors |
Negative-form check that the prompt is not pure anchor list — also passes incidentally. |
TestProfileDirectivesSurfacedInWpPrompt::test_python_pedro_directive_024_locality_referenced_in_implement_prompt |
Passes only because the fixture charter mentions "Locality of Change" in its body. |
TestProfileDirectivesSurfacedInWpPrompt::test_python_pedro_directive_030_test_typecheck_gate_referenced |
Passes only because the fixture charter mentions "Test and Typecheck Quality Gate". |
TestProfileDirectivesSurfacedInWpPrompt::test_reviewer_renata_directive_032_conceptual_alignment_in_review_prompt |
Passes only because the fixture charter explicitly cites DIRECTIVE_032. |
TestProfileDirectivesSurfacedInWpPrompt::test_reviewer_renata_tactic_language_driven_design_in_review_prompt |
Passes via the fixture charter body. |
TestProfileDirectivesSurfacedInWpPrompt::test_profile_directives_use_doctrine_catalog_namespace_in_prompt |
Passes because the fixture charter cites DIRECTIVE_032 in catalog form. |
TestCharterContextResolverCompleteness::test_implement_action_context_includes_terminology_canon_body |
Passes because the body is in the fixture charter. |
TestCharterContextResolverCompleteness::test_implement_action_context_includes_code_review_checklist_body |
Same. |
TestCharterContextResolverCompleteness::test_implement_action_context_emits_no_template_set_fallback_diagnostic |
Passes — fixture charter declares template_set explicitly. |
TestCharterContextResolverCompleteness::test_implement_action_context_includes_profile_directive_references_when_profile_known |
Passes because the fixture charter cites a catalog directive that matches the regex. |
Interpretation. The wired-but-passive entry point works fine when the charter is hand-authored to include every rule the profiles would want. The system relies on the operator to manually mirror profile-directive content into the charter prose. That defeats the purpose of having profile-declared directive references in the first place.
Failing (substantive payload gaps)
These nine failures are the executable specification for the remediation mission:
| # | Test | Substantive gap |
|---|---|---|
| 1 | TestImplementPromptContainsActionableGovernance::test_implement_prompt_regression_vigilance_body_or_fetch_with_when_doing_rule |
The prompt contains the "Regression Vigilance" section ANCHOR but not the body. No fetch + when-doing rule either. |
| 2 | TestProfileDirectivesSurfacedInWpPrompt::test_python_pedro_directive_010_referenced_in_implement_prompt |
When the fixture charter does NOT explicitly cite DIRECTIVE_010 in its body, the system does not auto-surface it from the loaded profile's directive-references. The profile-declared link is silently ignored. |
| 3 | TestPromptReferencesAuthorityPaths::test_implement_prompt_references_glossary_path |
No docs/context/ pointer; no fetch command paired with a "when you introduce a term, consult …" rule. The implementer has no signposted way to discover the project glossary. |
| 4 | TestPromptReferencesAuthorityPaths::test_implement_prompt_references_adr_path |
Same for docs/adr/2.x/. |
| 5 | TestImplementTemplateForbidClauseIsHonest::test_template_either_drops_forbid_or_guarantees_governance_payload |
src/specify_cli/missions/software-dev/command-templates/implement.md still forbids the agent from calling charter context itself but does not declare a "Governance Payload Contract" that lists the bodies the prompt is guaranteed to carry. The trap from the architecture review is intact. |
| 6 | TestCharterDirectiveNamespaceCrossLink::test_charter_sync_emits_cross_link_when_body_cites_catalog_id |
When the charter body cites DIRECTIVE_032, spec-kitty charter sync extracts a DIR-NNN entry but does not preserve a structured references: cross-link to the doctrine catalog ID. The two namespaces stay disconnected. |
| 7 | TestPromptSelfSufficiency::test_implement_prompt_self_sufficiency |
Aggregate: prompt is missing adr_pointer, regression_vigilance_body_or_fetch, and fetch_command_with_when_doing. The implementer reading only the prompt cannot satisfy the architectural checks. |
| 8 | TestProjectCharterDeclaresResolverInputs::test_project_charter_declares_template_set |
.kittify/charter/charter.md (this project's own charter) has no template_set declaration, so spec-kitty's own missions get the fallback diagnostic. |
| 9 | TestProjectCharterDeclaresResolverInputs::test_project_charter_declares_available_tools |
Same for available_tools. |
What the failures collectively say
Reading the nine failures as one statement: the system invokes the right pipeline at the right boundary, but the pipeline emits only what the operator manually wrote into the charter prose. Three classes of content are systematically lost between the data sources and the agent-facing prompt:
Profile-declared directive and tactic references. The agent profile YAML is a source of truth that says "this profile cares about these directives and tactics". The WP prompt never reads from that source. (Test 2 alone proves this; tests 6–10 in the passing list pass only because the fixture charter accidentally restates what the profile already says.)
External authority paths. Glossary and ADR locations are nowhere in the rendered prompt. The agent has no signposted way to discover the canonical source of project terminology or architectural intent. (Tests 3, 4.)
Inter-namespace cross-links. Charter-extracted
DIR-NNNdirectives and doctrine-catalogDIRECTIVE_NNNdirectives are not linked at sync time, so the resolver cannot surface a catalog body when a charterDIR-NNNcites one by reference. (Test 6.)
Three additional structural problems compound:
The runtime template's forbid clause is unhonest. It tells the agent to trust the prompt as authoritative without listing what the prompt is contractually required to contain. (Test 5.)
Anchor-only section listings give the appearance of governance without the substance. "Regression Vigilance" appears in the prompt as a heading; the rules under that heading do not. (Test 1.)
Spec-kitty's own charter (the dogfood case) under-specifies its resolver inputs. Missing
template_setandavailable_toolsdeclarations cause the resolver to emit fallback diagnostics that obscure which directives the charter actually intends to inject. (Tests 8, 9.)
How these findings map to the remediation mission spec
Translating the nine failures into mission-level requirements (the next step):
| Failure(s) | Functional requirement candidate |
|---|---|
| 1 | FR — The charter context resolver MUST embed the full body (or a fetch command + "when …" conditional) of every charter section whose anchor is included in the resolved context. |
| 2 | FR — When build_charter_context(profile=...) is called with a profile ID, the resolver MUST iterate profile.directive_references and profile.tactic_references and surface each (by ID + body, or by fetch + when-doing rule). |
| 3, 4 | FR — The resolved context for actions in BOOTSTRAP_ACTIONS MUST include a Project authority paths: block naming docs/context/, docs/adr/2.x/, and any project-charter-declared additional authority directories. |
| 5 | FR — The runtime implement template MUST contain either (a) no forbid clause, or (b) a "Governance Payload Contract" section that explicitly enumerates the body sections, profile directives, glossary pointer, and ADR pointer the prompt is guaranteed to carry. |
| 6 | FR — spec-kitty charter sync MUST detect catalog-namespace citations (DIRECTIVE_NNN, tactic IDs) inside extracted directive bodies and emit a structured references: cross-link field in .kittify/charter/directives.yaml. |
| 7 | FR — Aggregate self-sufficiency: the rendered prompt for implement and review MUST satisfy the contracts above for every action invocation in a software-dev mission. |
| 8, 9 | FR — Operator-facing: spec-kitty's own .kittify/charter/charter.md MUST declare template_set and available_tools in machine-readable form (either YAML front-matter or a fenced YAML block parsed by charter sync). |
Plus one cross-cutting non-functional requirement:
| Failure | NFR candidate |
|---|---|
| All | NFR — The augmented governance payload MUST not bloat the WP prompt beyond a budget that keeps the prompt comfortably under the agent's prompt-token allowance for a typical software-dev WP. Tokens spent on governance must be measured; if a payload exceeds the budget, the system MUST automatically substitute fetch commands for the longest sections. |
Suggested mission name
wp-prompt-governance-payload (mission slug: wp-prompt-governance-payload-<ulid>)
Mission target branch: feat/org-doctrine-layer (the current branch — the next mission stays on the same integration branch so the post-merge mission-review can audit both deliverables together before the integration branch is promoted).
Self-test discipline
Per the architectural review's "dogfood gap" framing, the success criterion for this mission is not that the tests turn green. It is that, after the mission ships, the next mission run on spec-kitty itself — any next mission — surfaces in its WP prompts:
- the loaded profile's directive and tactic references,
- the project glossary path,
- the ADR path,
- the body (or fetch + when-doing) of every charter section critical to the action.
The test suite is the executable contract; satisfying it is necessary but not sufficient. Satisfying it on spec-kitty's own missions is the acceptance test that has previously been missing.