Mission Specification: WP Metadata & State Type Hardening

Mission Branch: feature/metadata-state-type-hardening Created: 2026-04-06 Status: Draft Upstream issues: #417 (bug), #410 (enhancement), #405 (enhancement), #361 (enhancement — Phase 1 only) Related issues (not in scope): #418 (bug — hardcoded branch template), #422 (task-generation ownership inference), #423 (lane computation parallelism collapse)

Background

Three related structural problems exist in the work-package layer of spec-kitty:

1. #417finalize-tasks --validate-only violates its own contract: bootstrap runs unconditionally, silently rewriting WP frontmatter (dependencies, branch_strategy, and potentially more fields) before the validate/commit fork. Users who invoke --validate-only to confirm a manual repair find their repair destroyed.

2. #410 — WP frontmatter (tasks/WP*.md) has no formal schema. All 15+ consuming modules read it as dict[str, Any] via raw .get() calls with no validation. Separately, tasks.md header parsing is duplicated in four places with mismatched depth expectations (##, ###, none match ####), causing silent dependency-parse failures when LLM-generated files use deeper headings.

3. #405 — Lane/status transition logic is scattered across 46 files with 358 hardcoded lane string literals. Three separate LANES tuple definitions exist (one stale at 4 lanes). All intelligence about what a lane can do lives in module-level data structures and procedural guard functions rather than in the Lane enum itself.

These three issues share a root: the WP layer treats its own data structures as bags of unvalidated strings rather than typed domain objects.

User Scenarios & Testing

User Story 1 — Safe Feature State Inspection (Priority: P1)

A developer has manually patched WP frontmatter (e.g., restored stripped dependencies fields) and wants to confirm the feature is in a valid state before committing. They run finalize-tasks --validate-only expecting a read-only report.

Why this priority: This is a correctness bug. The current behavior silently destroys manual repairs, making it impossible to safely verify feature state. Fixing it unblocks all users who rely on --validate-only as a safe inspection tool.

Independent Test: Run --validate-only against a feature with manually edited WP frontmatter; verify git diff shows no changes afterward.

Acceptance Scenarios:

1. Given a WP file with manually set dependencies: [WP01, WP02], When finalize-tasks --validate-only runs, Then the dependencies field is unchanged on disk and git diff is empty. 2. Given any feature, When finalize-tasks --validate-only runs, Then no files on disk are modified (zero git diff output, no new commits). 3. Given --validate-only is invoked, When the feature has validation errors, Then errors are reported in the output without modifying any files.


User Story 2 — Bootstrap Mutation Transparency (Priority: P1)

A developer runs finalize-tasks (without --validate-only) and wants to understand exactly which frontmatter fields bootstrap can overwrite, so they know which fields are safe to set manually and which will be regenerated.

Why this priority: Without a documented mutation surface, users cannot predict what finalize-tasks will do to their WP files. This leads to silent data loss (as documented in #417's F015 evidence).

Independent Test: A developer documentation artifact (mutation surface doc or ADR) exists and is accurate against the code.

Acceptance Scenarios:

1. Given the bootstrap implementation, When a developer reads the mutation surface document, Then every frontmatter field that bootstrap can write or overwrite is listed with its source (parsed from tasks.md, computed, or preserved). 2. Given a WP file with a non-empty dependencies field, When bootstrap runs in non-validate-only mode, Then the behavior (preserve or overwrite) matches what is documented.


User Story 3 — Robust tasks.md Header Parsing (Priority: P2)

A developer or LLM generates a tasks.md file that uses #### headings for work packages (nested under phase groupings). They run finalize-tasks and expect WP dependencies to be parsed correctly.

Why this priority: The immediate cause of the F015 silent dependency-strip incident. A one-line regex change per site eliminates the entire class of problem.

Independent Test: Run finalize-tasks against a tasks.md with #### WP01 headings; verify dependencies frontmatter is populated correctly.

Acceptance Scenarios:

1. Given a tasks.md with WP headers at ## depth, When parsing runs, Then all WPs are detected (regression: existing behavior preserved). 2. Given a tasks.md with WP headers at ### or #### depth, When parsing runs, Then all WPs and their dependencies are detected correctly. 3. Given a tasks.md with WP headers at ##### depth (deeper than allowed), When parsing runs, Then those headings are NOT detected (boundary enforced).


User Story 4 — Typed WP Frontmatter (Priority: P2)

A developer adds a new consumer of WP frontmatter. Instead of guessing field names and types by reading source files, they import WPMetadata and get IDE completion, type checking, and a validation error at load time if a required field is missing.

Why this priority: Eliminates the category of runtime failures caused by field-name typos, missing required fields, and type mismatches across 15+ consumer modules.

Independent Test: A consumer module that previously used frontmatter.get("dependencies", []) now uses wp_meta.dependencies with full type safety; a WP file missing a required field raises a validation error on load.

Acceptance Scenarios:

1. Given a WP frontmatter file with all required fields, When loaded via read_wp_frontmatter(), Then a valid WPMetadata object is returned with typed attribute access. 2. Given a WP frontmatter file missing a required field (e.g., work_package_id), When loaded via read_wp_frontmatter(), Then a ValidationError is raised immediately at load time. 3. Given all active WP files in kitty-specs/, When a CI test runs WPMetadata.model_validate() against each, Then all pass without modification. 4. Given extra="forbid" is set after consumer migration, When a WP file contains an unrecognized field, Then a validation error is raised (strict schema enforced).


User Story 5 — Single Source of Lane Transition Truth (Priority: P3)

A developer needs to add a new guard condition for a lane transition. Instead of searching 46 files for the right place, they open src/specify_cli/status/ and find the WPState protocol and its concrete implementations as the definitive source.

Why this priority: Reduces the surface area for transition-logic bugs and makes the status model self-documenting.

Independent Test: The WPState protocol and 9 concrete lane state classes exist; a property test harness proves their transition matrix is identical to the updated ALLOWED_TRANSITIONS frozenset (including the new in_review lane transitions).

Acceptance Scenarios:

1. Given the WPState protocol, When a developer calls state.allowed_targets(), Then they get the same set as the current ALLOWED_TRANSITIONS lookup for that lane. 2. Given the TransitionContext value object, When guard evaluation runs, Then the result is identical to the current string-keyed _run_guard() dispatch for every input combination. 3. Given the property test harness, When it runs against both old procedural code and the new WPState implementation, Then both produce identical results for all 25 allowed transition pairs (22 current + 3 net new from in_review promotion) and all guarded combinations.


User Story 6 — High-Touch Consumers Use State Object (Priority: P3)

A developer reading orchestrator_api/commands.py, next/decision.py, or dashboard/scanner.py no longer sees if current_lane == "planned" / elif "claimed" cascades. All lane-conditional logic in these three files delegates to WPState methods.

Why this priority: These three files account for the majority of the scattered lane logic (22 occurrences in orchestrator_api alone). Migrating them proves the pattern works and reduces the highest-density hotspot.

Independent Test: grep -r 'current_lane ==' src/specify_cli/orchestrator_api src/specify_cli/next src/specify_cli/dashboard returns no matches after migration.

Acceptance Scenarios:

1. Given orchestrator_api/commands.py, When lane-conditional logic executes, Then it delegates to WPState methods; no direct lane string comparisons remain in the file. 2. Given next/decision.py and dashboard/scanner.py, When lane bucketing or progress computation runs, Then it uses state.progress_bucket() or state.display_category(); no ad-hoc lane string sets remain. 3. Given the three migrated files, When the full test suite runs, Then all existing tests pass without modification.


Edge Cases

  • What happens when --validate-only is combined with --json? The JSON output must not report any bootstrap mutations (or must report zero mutations).
  • What happens when a WP file contains both known fields and unknown extra fields during the extra="allow" migration phase? Unknown fields must be preserved (round-trip safe).
  • What happens when WPMetadata.model_validate() is called on a pre-0.11.0 WP file that lacks newer optional fields (e.g., planning_base_branch)? Optional fields must default gracefully.
  • What happens when a WPState concrete class is asked to transition to a lane not in its allowed_targets()? Must raise the same error as the current validate_transition() call.
  • What happens when two agents concurrently attempt to claim a for_review WP for review? The first agent's for_review -> in_review transition succeeds; the second agent's attempt must fail with WP_ALREADY_CLAIMED (FR-012b).
  • What happens when the property test harness encounters a guarded transition where the guard requires context the test doesn't supply? Test must use explicit TransitionContext fixtures for all guard-relevant cases.

Requirements

Functional Requirements

IDTitleDescriptionPriorityStatus
FR-001Validate-only skips bootstrapfinalize-tasks --validate-only must not run the bootstrap step; no files on disk may be modified during a validate-only invocation.HighOpen
FR-002Validate-only JSON contractWhen --validate-only --json is used, the JSON output must not contain a bootstrap key reporting mutations; it may contain a validation key with the report only.HighOpen
FR-003Bootstrap mutation auditA documentation artifact (ADR or developer note) lists every frontmatter field that bootstrap can write or overwrite, with source (parsed, computed, or conditionally preserved).HighOpen
FR-004tasks.md header depth flexibilityAll four tasks.md header-parsing regex sites accept WP headings at ##, ###, and #### depth; headings at ##### or deeper are not matched.HighOpen
FR-005WPMetadata Pydantic modelA WPMetadata Pydantic model exists with typed fields for all known frontmatter keys, field validators, frozen=True, and extra="allow" initially.MediumOpen
FR-006WPMetadata load functionA read_wp_frontmatter(path) convenience function returns a (WPMetadata, body) tuple; it raises ValidationError on load if required fields are absent or malformed.MediumOpen
FR-007Consumer migrationAll consumer modules currently accessing WP frontmatter via dict.get() are migrated to use WPMetadata typed attribute access.MediumOpen
FR-008extra="forbid" tighteningAfter all consumers are migrated and all active WP files in kitty-specs/ pass validation, WPMetadata is tightened to extra="forbid". A CI test validates all WP files pass.MediumOpen
FR-009WPState protocolA WPState protocol (or abstract base class) defines the interface: lane, is_terminal, is_blocked, allowed_targets(), can_transition_to(target, ctx), transition(target, ctx), progress_bucket(), display_category().MediumOpen
FR-010Concrete lane state classesNine concrete WPState implementations exist, one per canonical lane: planned, claimed, in_progress, for_review, in_review, approved, done, blocked, canceled. The doing alias is resolved at input boundaries and does not get its own class. The former in_review alias is promoted to a first-class lane to resolve the parallel-execution review contention blind spot.MediumOpen
FR-011TransitionContext value objectA TransitionContext dataclass (frozen) replaces the current 8-argument kwargs bag in guard evaluation. Fields: actor, workspace_context, subtasks_complete, evidence, review_ref (legacy compat), review_result (structured ReviewResult for in_review exits), reason, force, implementation_evidence_present.MediumOpen
FR-012Property test equivalence harnessA property test suite proves the new WPState transition matrix and guard outcomes are identical to the current ALLOWED_TRANSITIONS frozenset and _run_guard() dispatch for all 25 allowed pairs (post-in_review promotion) and all guard-relevant combinations.MediumOpen
FR-012ain_review lane promotionThe in_review alias (LANE_ALIASES["in_review"] = "for_review") is removed and replaced with a first-class Lane.IN_REVIEW enum member, InReviewState concrete class, and associated transitions. for_review becomes a pure queue state (outbound: in_review, blocked, canceled only). in_review carries the reviewer's active-work transitions (outbound: approved, done, in_progress, planned, blocked, canceled). The (for_review, in_review) transition has an actor-required guard with conflict detection, preventing concurrent review claims on the same WP.MediumOpen
FR-012bReview claim conflict detectionWhen an agent attempts to transition a WP from for_review to in_review, and another actor has already claimed it (WP is in in_review with a different actor), the transition must fail with a WP_ALREADY_CLAIMED error analogous to the implementation claiming mechanism.MediumOpen
FR-012cStructured ReviewResult on in_review exitEvery outbound transition from in_review must carry a structured review result in the TransitionContext. Approval transitions (in_review -> approved, in_review -> done) require a ReviewResult with verdict="approved", reviewer identity, and reference. Rejection transitions (in_review -> in_progress, in_review -> planned) require a ReviewResult with verdict="changes_requested" and a review feedback reference. This unifies the currently asymmetric approval (DoneEvidence.review) and rejection (review_ref string) recording paths.MediumOpen
FR-012dLane model documentation consistencyAll user-facing and developer-facing documentation reflecting the lane model must be updated to the 9-lane state machine. Affected files: README.md (Mermaid state diagram), docs/explanation/kanban-workflow.md (lane definitions + transition table), docs/status-model.md (state machine section + guard table), docs/2x/runtime-and-missions.md (state machine reference), CLAUDE.md (stale 7-lane section). The WP05 ADR supersedes architecture/2.x/adr/2026-04-03-2-review-approval-and-integration-completion-are-distinct.md.MediumOpen
FR-013LANES deduplicationThe three duplicate LANES tuple definitions are collapsed: tasks_support.py and scripts/tasks/task_helpers.py import from the canonical status package; the stale 4-lane tuple in task_helpers.py is removed.LowOpen
FR-014High-touch consumer migrationorchestrator_api/commands.py, next/decision.py, and dashboard/scanner.py are migrated to use WPState methods; no direct lane string comparisons remain in these three files.LowOpen
FR-015Dashboard API TypedDict contractsTypedDict response shapes are defined in src/specify_cli/dashboard/api_types.py for all JSON dashboard endpoints. Handler methods construct responses through these types.LowOpen
FR-016Dashboard API contract testA pytest contract test validates that the JS frontend references the same response keys that the Python TypedDict definitions declare (#361 Phase 1).LowOpen

Non-Functional Requirements

IDTitleRequirementCategoryPriorityStatus
NFR-001Test suite regression-freeThe full existing test suite passes without modification after each work package (beyond import path updates). Zero new test failures introduced.CorrectnessHighOpen
NFR-002Event log format unchangedThe StatusEvent JSONL format is byte-for-byte identical before and after; no new fields added, no field renames in the event log.CompatibilityHighOpen
NFR-003Emit pipeline contract preservedemit_status_transition() remains the single entry point for state changes; its public signature does not change; callers require no updates outside the three migrated consumer files.CompatibilityHighOpen
NFR-004WPMetadata round-trip safeAny WP frontmatter file that passes WPMetadata.model_validate() must produce identical YAML when serialized back to disk (no field reordering, no value coercion, no loss of unknown extra fields during the extra="allow" phase).CorrectnessMediumOpen
NFR-005WPState instantiation costConstructing a WPState object from a lane string must complete in under 1 ms on reference hardware; bulk snapshot materialization must not regress by more than 5% compared to the current reducer.PerformanceLowOpen
NFR-006Dashboard operability preservedThe dashboard remains operationally functional throughout and after the migration. Dashboard API JSON responses produce identical structure and values before and after WP04 and WP06 consumer migrations.CorrectnessHighOpen
NFR-007Linter non-regressionNew and modified code must not increase the mypy or ruff error count for touched files. Boy Scout improvements (DIRECTIVE_025) should decrease the count where proportional.MaintainabilityMediumOpen
NFR-008CI stage isolation for status testsStatus-layer tests run in a dedicated CI stage parallel to core tests, providing faster feedback on status-layer regressions.EfficiencyLowOpen
NFR-009Self Observation ProtocolAgents write structured work logs to work/observations/ during WP execution (implement, review, or coordination sessions). Logs capture work summaries, tooling friction, spec gaps, and recommendations for post-mission analysis. Advisory — must not block WP progression.ObservabilityLowOpen

Constraints

IDTitleConstraintCategoryPriorityStatus
C-001No event log format changeStatusEvent JSONL field names, types, and ordering must not change.TechnicalHighOpen
C-002Reducer and store untouchedstatus/reducer.py and status/store.py may not have behavioral changes; only import-path updates are permitted.TechnicalHighOpen
C-003No WP file modifications requiredAll active WP files in kitty-specs/ must pass WPMetadata.model_validate() without any manual edits to those files.TechnicalHighOpen
C-004Non-migrated consumers still workThe 40+ consumer modules NOT migrated in this feature must continue to function with the existing procedural API unchanged. The old validate_transition() and ALLOWED_TRANSITIONS remain accessible during the transition period.TechnicalHighOpen
C-005No new runtime dependenciesThis feature introduces no new third-party runtime dependencies beyond Pydantic (already present).TechnicalMediumOpen

Key Entities

  • WPMetadata: Value object (Pydantic, frozen=True) representing the structured frontmatter of a WP prompt file. Fields cover identity (work_package_id, title), dependency graph (dependencies), branch contract (base_branch, base_commit, planning_base_branch, merge_target_branch), and optional planning metadata. Equality by attribute values; no mutable identity.
  • WPState: Protocol (or ABC) representing a work package's behavioral state at a given lane. Encapsulates allowed transitions, guard evaluation, terminal/blocked classification, and progress/display categorization. Nine concrete implementations (one per canonical lane, including the promoted in_review lane). Stateless value-like objects instantiated from a Lane enum value.
  • TransitionContext: Frozen dataclass carrying all inputs needed for guard evaluation. Replaces the heterogeneous 8-argument kwargs bag currently passed to _run_guard(). Immutable; equality by attributes.

Success Criteria

Measurable Outcomes

  • SC-001: finalize-tasks --validate-only produces zero bytes of git diff output against any feature directory in a clean working tree.
  • SC-002: A developer documentation artifact accurately lists every frontmatter field that the bootstrap step can write or overwrite (verified by code review against the implementation).
  • SC-003: finalize-tasks correctly parses dependencies from a tasks.md file using #### WP headers; previously silent failures now produce correct frontmatter.
  • SC-004: A CI test validates all WP files in kitty-specs/ against WPMetadata.model_validate() and passes with zero failures, without modifying any WP file.
  • SC-005: The property test harness asserts 100% equivalence between the new WPState transition matrix and the updated ALLOWED_TRANSITIONS for all allowed transition pairs (including in_review lane transitions) and all guarded transition combinations.
  • SC-006: After consumer migration, grep -r 'current_lane ==' src/specify_cli/orchestrator_api src/specify_cli/next src/specify_cli/dashboard returns zero matches.
  • SC-007: The full test suite (unit + integration) passes with zero regressions after each work package merges to the feature branch.
  • SC-008: A dashboard API contract test validates that all JSON endpoint response shapes match their TypedDict definitions and that the JS frontend references the same keys.

Assumptions

  • Pydantic is already a runtime dependency of spec-kitty (v2.x API assumed).
  • All WP files in kitty-specs/ use UTF-8 YAML frontmatter delimited by ---.
  • The doing lane alias is resolved at input boundaries and never persisted in the event log; WPState for doing can delegate to InProgressState. The former in_review alias is promoted to a first-class lane; see FR-012a.
  • The partial Phase 2 consumer migration (three files) does not require the remaining consumers to be migrated before it can be merged; the old procedural API coexists with the State Object during the transition period.
  • The bootstrap mutation audit (FR-003) requires reading the finalize_tasks implementation in detail; this is scoped to the implementation research WP.
  • #418tasks/README.md template hardcoded 2.x branch instead of using resolved branch context. This was discovered during mission planning but has since been fixed in the rebased baseline (core/feature_creation.py). It is no longer active scope for mission 065 and must not be reintroduced while editing planning artifacts or dashboard surfaces.
  • #422/spec-kitty.tasks can generate impossible WPs (ownership inference picks up read-only references, nonexistent files, overly broad wildcards) and incomplete lane graphs (WPs dropped from lanes.json). Mission 065's WPMetadata type hardening (WP03/WP04, extra="forbid") partially mitigates this by enabling validation of owned_files at parse time. The task-generation tooling fix itself is a separate mission.
  • #423 — Lane computation silently erases declared parallelism via ownership/write-scope collapse. The typed owned_files: list[str] field in WPMetadata provides a foundation for future ownership-precision validation, but the lane planner reconciliation logic is outside this mission's scope.