Work Packages: WP Metadata & State Type Hardening
Inputs: Design documents from /kitty-specs/065-wp-metadata-state-type-hardening/ Prerequisites: plan.md (required), spec.md (user stories), research.md (decisions), data-model.md (entities)
Tests: Tests are included where the spec or doctrine mandates them (test-first-bug-fixing, ATDD, property equivalence, contract tests). All WPs follow DIRECTIVE_030 (quality gate) and DIRECTIVE_034 (test-first).
Organization: Fine-grained subtasks (Txxx) roll up into work packages (WPxx). Each work package must be independently deliverable and testable.
Prompt Files: Each work package references a matching prompt file in /tasks/ generated by /spec-kitty.tasks. Treat this file as the high-level checklist; keep deep implementation detail inside the prompt files.
Subtask Format: [Txxx] [P?] Description
- [P] indicates the subtask can proceed in parallel (different files/components).
- Include precise file paths or modules.
Path Conventions
- Single project:
src/specify_cli/,tests/
Phase 1 — Bug Fix & Foundation
Work Package WP01: Validate-Only Bootstrap Fix (#417) (Priority: P0)
Goal: Guard write_frontmatter() calls in finalize_tasks() with not validate_only; audit and document the full mutation surface; update JSON output contract. Independent Test: git diff is empty after --validate-only invocation against any mission with manually edited WP frontmatter. Prompt: /tasks/WP01-validate-only-bootstrap-fix.md Requirement Refs: FR-001, FR-002, FR-003, SC-001, SC-002 Estimated Prompt Size: ~250 lines
Included Subtasks
- ✅ T001 Write failing test for
--validate-onlyfrontmatter mutation (test-first-bug-fixing procedure) - ✅ T002 Guard
write_frontmatter()withnot validate_onlyatfeature.py:1620 - ✅ T003 Update JSON output contract — remove
bootstrapkey from--validate-onlyresponse - ✅ T004 Document bootstrap mutation surface (developer note listing all 8 fields with source/condition)
Implementation Notes
- Follow
test-first-bug-fixing.procedure.yaml: understand bug → choose test level → write failing test → verify fails for right reason → fix → full suite → commit together. - The fix is a one-line guard at
feature.py:1620. The test and documentation are the bulk of the work. - The mutation surface document (T004) satisfies FR-003 and SC-002.
Parallel Opportunities
- T001 and T004 touch different files and can be started in parallel (test vs docs).
Dependencies
- None (starting package).
Risks & Mitigations
- The mutation loop may have changed since research.md was written; verify exact line numbers before patching.
Work Package WP02: tasks.md Header Regex Standardization (#410) (Priority: P0)
Goal: Update 5 regex sites across 3 files to accept ##, ###, and #### WP headers; add regression tests for each depth. Independent Test: finalize-tasks correctly parses dependencies from a tasks.md using #### WP01 headers. Prompt: /tasks/WP02-header-regex-standardization.md Requirement Refs: FR-004, SC-003 Estimated Prompt Size: ~280 lines
Included Subtasks
- ✅ T005 Write regression tests for
##,###,####header depths across all 5 regex sites - ✅ T006 Fix
_parse_wp_sections_from_tasks_md()infeature.py:1953to use#{2,4} - ✅ T007 Fix
_infer_subtasks_complete()inemit.py:148,151to use#{2,4} - ✅ T008 Fix subtask checker in
tasks.py:305,310+ Boy Scout: remove unusedrepo_rootparam atemit.py:441
Implementation Notes
- Standardized patterns from research.md Finding 2:
- WP header match:
^#{2,4}\s+(?:Work Package\s+)?(WP\d{2})(?:\b|:) - Section boundary:
^#{2,4}.*\b{wp_id}\b - Section end:
^#{2,4}\s+ - Boy Scout (DIRECTIVE_025): remove unused
repo_rootparam inemit.py:441while touching that file.
Parallel Opportunities
- T006, T007, T008 touch different files and can proceed in parallel after T005 writes the test harness.
Dependencies
- Depends on WP01 (shared
feature.pyfile; WP01 must merge first to avoid conflicts).
Risks & Mitigations
- Overly permissive regex could match unintended headings;
#{2,4}explicitly excludes#(h1) and#####+ (h5+).
Phase 2 — Typed Domain Models
Work Package WP03: WPMetadata Pydantic Model (#410) (Priority: P1)
Goal: Create WPMetadata Pydantic v2 model with typed fields, field validators, frozen=True, extra="allow", and a read_wp_frontmatter() loader. Add CI validation test. Independent Test: All kitty-specs/ WP files pass WPMetadata.model_validate() without modification; ValidationError raised on malformed input. Prompt: /tasks/WP03-wp-metadata-pydantic-model.md Requirement Refs: FR-005, FR-006, SC-004, NFR-004, C-003, C-005 Estimated Prompt Size: ~350 lines
Included Subtasks
- ✅ T009 Create
src/specify_cli/status/wp_metadata.pywithWPMetadatamodel per data-model.md - ✅ T010 Implement
read_wp_frontmatter()loader function wrappingFrontmatterManager - ✅ T011 Add field validators (work_package_id pattern, base_commit hex pattern, title min_length)
- ✅ T012 [P] Write unit tests for WPMetadata (valid, invalid, unknown extras preserved, round-trip)
- ✅ T013 [P] Write CI test validating all active
kitty-specs/WP files passmodel_validate()
Implementation Notes
- Use TDD red-green-refactor for model creation (new code, not migration).
extra="allow"is critical for backward compatibility — unknown fields must be preserved.- Round-trip safety (NFR-004): validate that serializing back to YAML preserves field order and values.
- Export
WPMetadataandread_wp_frontmatterfromstatus/__init__.py.
Parallel Opportunities
- T012 and T013 are independent test files that can be written in parallel.
Dependencies
- Depends on WP01 (establishes the baseline; avoids merge conflicts in shared files).
Risks & Mitigations
- Old WP files may have unexpected field values; CI test (T013) will surface any issues immediately.
FrontmatterManagermay return types that don't match Pydantic expectations; loader must handle type coercion.
Work Package WP04: WPMetadata Consumer Migration + extra=forbid (#410) (Priority: P1)
Goal: Migrate all consumer modules from frontmatter.get("...") to wp_meta.<field> access. After all consumers migrated, tighten to extra="forbid". Independent Test: grep -r 'frontmatter\.get\|\.get("work_package_id' src/specify_cli/ returns no matches outside frontmatter.py; dashboard endpoints return identical JSON. Prompt: /tasks/WP04-wp-metadata-consumer-migration.md Requirement Refs: FR-007, FR-008, SC-004, NFR-001, NFR-006 Estimated Prompt Size: ~500 lines
Included Subtasks
- ✅ T014 Migrate
dependency_graph.pyto useWPMetadata - ✅ T015 Migrate
feature.pyWP frontmatter reads to useWPMetadata - ✅ T016 Migrate
task_profile.py,acceptance.py,status/bootstrap.pyto useWPMetadata - ✅ T017 Migrate
dashboard/scanner.pyto useWPMetadata+ Boy Scout: remove unusedfeaturesparam - ✅ T018 Migrate
requirement_mapping.py+ remaining consumers + Boy Scout: extract duplicated regex - ✅ T019 Tighten
extra="allow"→extra="forbid"+ update CI test - ✅ T020 Verify dashboard API endpoints produce identical JSON before/after (NFR-006)
Implementation Notes
- Follow Strangler Fig tactic: one consumer per commit,
extra="allow"coexists with legacy during migration. - Quality gate verification after each commit (run focused tests + mypy).
- Boy Scout (DIRECTIVE_025): extract duplicated regex in
requirement_mapping.py:88; remove unusedfeaturesparam inscanner.py:278; remove unuseduse_legacyinacceptance.py:448. - T020 is the final dashboard operability validation before tightening.
Parallel Opportunities
- T014, T015, T016, T017, T018 touch different files and can be parallelized in principle, but the Strangler Fig tactic recommends sequential one-per-commit migration for safety.
Dependencies
- Depends on WP03 (WPMetadata model must exist before consumers can use it).
Risks & Mitigations
- Consumers may use frontmatter fields not yet in
WPMetadata; the CI test from WP03 (T013) catches this. extra="forbid"may break if any WP file has truly unknown fields; defer tightening until all files pass.
Work Package WP05: WPState ABC + TransitionContext + Property Tests (#405) (Priority: P1)
Goal: Create WPState ABC with 9 concrete lane state classes (including promoted InReviewState), TransitionContext dataclass (with review_result field), factory function, property test harness proving transition equivalence, and documentation updates to 9-lane model. Independent Test: Property tests pass proving identical transition matrix and guard outcomes vs existing ALLOWED_TRANSITIONS + _run_guard(). Prompt: /tasks/WP05-wp-state-abc-property-tests.md Requirement Refs: FR-009, FR-010, FR-011, FR-012, FR-012a, FR-012b, FR-012c, FR-012d, SC-005, NFR-005, C-001, C-004 Estimated Prompt Size: ~650 lines
Included Subtasks
- ✅ T021 Write ADR for State Pattern design decision (ABC vs Protocol, DIRECTIVE_003); include
in_reviewpromotion rationale, supersede prior review ADR - ✅ T022 Create
TransitionContextfrozen dataclass instatus/transition_context.pywithreview_result: ReviewResult | Nonefield (FR-012c) - ✅ T023 Create
WPStateABC instatus/wp_state.pywith interface per data-model.md - ✅ T024 Implement 9 concrete lane state classes (including
InReviewState) + factory functionwp_state_for(); removein_reviewfromLANE_ALIASES(FR-012a); restrictfor_reviewoutbound to{in_review, blocked, canceled} - ✅ T025 Write property tests: transition matrix equivalence (all allowed pairs vs
ALLOWED_TRANSITIONSfor 9 lanes) - ✅ T026 Write property tests: guard equivalence (all guarded transitions vs
_run_guard(), includingin_reviewguards) - ✅ T027 [P] Write
TransitionContextunit tests + Boy Scout: extract duplicated error messages intransitions.py - ✅ FR-012d: Update 5 documentation files to 9-lane model (README.md, kanban-workflow.md, status-model.md, runtime-and-missions.md, CLAUDE.md)
Implementation Notes
- ADR (T021) must be committed BEFORE implementation code (DIRECTIVE_003). ADR should include
in_reviewpromotion rationale and supersedearchitecture/2.x/adr/2026-04-03-2-review-approval-and-integration-completion-are-distinct.md. - Use ZOMBIES TDD progression for 9 concrete classes: Zero → One → Many → Boundary → Interface → Exception → Simple.
- Property tests use explicit enumeration (not Hypothesis) over all allowed pairs and guard-relevant
TransitionContextfixtures. doingalias resolution:wp_state_for("doing")returnsInProgressState. NoDoingStateclass.in_reviewis a first-class lane (FR-012a):wp_state_for("in_review")returnsInReviewState. NOT aliased.for_reviewbecomes a pure queue state: outbound only to{in_review, blocked, canceled}(FR-012a).for_review → in_reviewrequires actor-required guard with conflict detection (FR-012b).- All
in_review → *transitions require structuredReviewResultinTransitionContext(FR-012c). - Export
WPState,TransitionContext,wp_state_for,ReviewResultfromstatus/__init__.py. - Boy Scout (DIRECTIVE_025): extract 2 duplicated error messages in
transitions.pyto constants. - Documentation updates (FR-012d): 5 files must reflect 9-lane model before
for_review.
Parallel Opportunities
- T022 (TransitionContext) and T023/T024 (WPState + concrete classes) touch different files.
- T027 (TransitionContext tests) is independent of T025/T026 (WPState property tests).
Dependencies
- Depends on WP01 (establishes the baseline).
Risks & Mitigations
- Guard logic in
_run_guard()may have subtle conditions not captured in research; property tests will surface discrepancies. WPStateinstantiation must be < 1 ms (NFR-005); frozen dataclasses are inherently lightweight.
Phase 3 — Consumer Migration
Work Package WP06: WPState Consumer Migration — High-Touch Trio (#405) (Priority: P2)
Goal: Migrate orchestrator_api/commands.py, next/decision.py, and dashboard/scanner.py to use WPState methods. Deduplicate LANES tuples. Independent Test: grep -r 'current_lane ==' src/specify_cli/orchestrator_api src/specify_cli/next src/specify_cli/dashboard returns zero matches. Prompt: /tasks/WP06-wp-state-consumer-migration.md Requirement Refs: FR-013, FR-014, SC-006, NFR-001, NFR-006, C-004 Estimated Prompt Size: ~420 lines
Included Subtasks
- ✅ T028 Migrate
orchestrator_api/commands.py+ Boy Scout: handle empty except clause, extract help string constants - ✅ T029 Migrate
next/decision.pyto useWPStatemethods - ✅ T030 Migrate
dashboard/scanner.pyto useWPStatemethods - ✅ T031 Deduplicate
LANEStuples:tasks_support.py+task_helpers.pyimport fromstatuspackage - ✅ T032 Remove stale 4-lane tuple in
scripts/tasks/task_helpers.py - ✅ T033 Verify dashboard kanban bucketing produces identical results via
WPState.display_category()(NFR-006)
Implementation Notes
- Follow Strangler Fig tactic: one consumer per commit. Old
validate_transition()API preserved for non-migrated consumers (C-004). - Key eliminations per plan.md:
_RUN_AFFECTING_LANESfrozenset →state.affects_runpropertyif current_lane == "planned" / elif "claimed"cascades →state.allowed_targets()/state.progress_bucket()- Boy Scout (DIRECTIVE_025): handle empty except at
commands.py:135; extract 3 duplicated help strings to constants. - T033 is the dashboard operability gate (NFR-006).
Parallel Opportunities
- T028, T029, T030 touch different files and can proceed in parallel.
- T031/T032 (LANES dedup) are independent of the consumer migrations.
Dependencies
- Depends on WP05 (WPState ABC and property tests must exist before consumer migration).
Risks & Mitigations
- Consumer migration may reveal guard conditions not covered by WP05 property tests; run full test suite after each migration commit.
- Old API must remain callable by 40+ non-migrated consumers; never remove
validate_transition()orALLOWED_TRANSITIONS.
Phase 4 — Infrastructure & Cross-Cutting
Work Package WP07: Status Test Suite CI Stage Split (Priority: P2)
Goal: Add fast-tests-status and integration-tests-status CI jobs; update existing jobs to exclude status test paths. Independent Test: New CI jobs run; fast-tests-core no longer executes status tests. Prompt: /tasks/WP07-ci-status-stage-split.md Requirement Refs: NFR-008 Estimated Prompt Size: ~250 lines
Included Subtasks
- ✅ T034 [P] Add
fast-tests-statusCI job toci-quality.yml(runstests/status/+tests/specify_cli/status/) - ✅ T035 [P] Add
integration-tests-statusCI job (needs:fast-tests-status+fast-tests-core) - ✅ T036 Update
fast-tests-coreandintegration-tests-coreto--ignorestatus paths - ✅ T037 Validate CI: push branch, verify new jobs appear and
fast-tests-coreexcludes status tests
Implementation Notes
`` kernel-tests ├── fast-tests-doctrine (unchanged) ├── fast-tests-status (NEW) ├── fast-tests-core (modified: --ignore) │ ├── integration-tests-doctrine (unchanged) │ ├── integration-tests-status (NEW) │ └── integration-tests-core (modified: --ignore) ``
- Target CI graph from research.md Finding 4:
- This WP is independent of all others and can be implemented at any point.
Parallel Opportunities
- T034 and T035 define new jobs; T036 modifies existing jobs. All changes are in one file (
ci-quality.yml). - The entire WP can run in parallel with any other WP.
Dependencies
- None (independent of all other WPs).
Risks & Mitigations
- CI job names must be unique; verify no collisions with existing job matrix entries.
--ignorepaths must match exactly; test locally withpytest --collect-only --ignore=...before pushing.
Work Package WP08: Dashboard API TypedDict Contracts (#361 Phase 1) (Priority: P2)
Goal: Define TypedDict response shapes for all JSON dashboard endpoints; migrate handlers to construct responses through these types; write JS ↔ Python contract test; apply Boy Scout JS fixes. Independent Test: mypy passes on handler files; contract test validates JS frontend references the same keys as Python TypedDict definitions. Prompt: /tasks/WP08-dashboard-api-typed-contracts.md Requirement Refs: FR-015, FR-016, SC-008, NFR-006, NFR-007 Estimated Prompt Size: ~480 lines
Included Subtasks
- ✅ T038 Create
src/specify_cli/dashboard/api_types.pywith TypedDict definitions per data-model.md - ✅ T039 Migrate
dashboard/handlers/features.pyto construct responses through TypedDict types - ✅ T040 Migrate
dashboard/handlers/api.pyto construct responses through TypedDict types + Boy Scout: rename reassignedpathvariable - ✅ T041 Write pytest contract test (
test_api_contract.py) validating JS ↔ Python key alignment - ✅ T042 [P] Boy Scout JS fixes: remove unused
artifactKey,.find()→.some(),isNaN()→Number.isNaN(),RegExp.exec() - ✅ T043 [P] Boy Scout JS fixes: optional chaining (9 sites), Promise rejection with Error (8 sites)
- ✅ T044 Run
mypyon handler files, verify pass
Implementation Notes
- TypedDict definitions from data-model.md:
ArtifactInfo,KanbanStats,KanbanTaskData,KanbanResponse,HealthResponse,ResearchResponse,ArtifactDirectoryResponse. FeaturesListResponseis the largest shape (~15 keys); finalize during implementation based on post-migration handler output.- Contract test approach: parse
dashboard.jsfor.keyand["key"]property accesses on fetch responses; compare against TypedDict__annotations__. - Boy Scout (DIRECTIVE_025): all JS fixes from Tier 1 and Tier 2 SonarCloud items assigned to WP08.
Parallel Opportunities
- T042 and T043 (JS fixes) are independent of T038-T041 (Python TypedDict work) and can proceed in parallel.
Dependencies
- Depends on WP04 (WPMetadata migration in
scanner.py) and WP06 (WPState migration inscanner.py). Both must complete before dashboard TypedDict work can safely assess final response shapes.
Risks & Mitigations
- JS ↔ Python contract test may be fragile if JS uses dynamic property access; limit test scope to statically analyzable accesses.
- Dashboard handler response shapes may be more complex than data-model.md estimates;
FeaturesListResponsemay need expansion.
Dependency & Execution Summary
- Sequence: WP01 → {WP02, WP03, WP05} → {WP04, WP06} → WP08
- Independent: WP07 can run at any time, in parallel with everything.
- Parallelization: After WP01 merges, WP02/WP03/WP05 can run concurrently. After WP03→WP04 and WP05→WP06 complete, WP08 can start.
- MVP Scope: WP01 + WP02 (bug fixes, immediate user value).
WP01 (validate-only fix)
├── WP02 (regex fix) ← independent of WP03+
├── WP03 (WPMetadata model)
│ └── WP04 (consumer migration + extra=forbid)
│ └── WP08 (dashboard API TypedDict contracts) ← also depends on WP06
└── WP05 (WPState + TransitionContext + property tests)
└── WP06 (consumer migration: orchestrator_api, next/decision, dashboard)
└── WP08 (dashboard API TypedDict contracts)
WP07 (CI stage split) ← independent of all above
Requirements Coverage Summary
| Requirement ID | Covered By Work Package(s) |
|---|---|
| FR-001 | WP01 |
| FR-002 | WP01 |
| FR-003 | WP01 |
| FR-004 | WP02 |
| FR-005 | WP03 |
| FR-006 | WP03 |
| FR-007 | WP04 |
| FR-008 | WP04 |
| FR-009 | WP05 |
| FR-010 | WP05 |
| FR-011 | WP05 |
| FR-012 | WP05 |
| FR-012a | WP05 |
| FR-012b | WP05 |
| FR-012c | WP05 |
| FR-012d | WP05 |
| FR-013 | WP06 |
| FR-014 | WP06 |
| FR-015 | WP08 |
| FR-016 | WP08 |
| NFR-001 | WP01–WP08 (all) |
| NFR-002 | WP05, WP06 |
| NFR-003 | WP05, WP06 |
| NFR-004 | WP03 |
| NFR-005 | WP05 |
| NFR-006 | WP04, WP06, WP08 |
| NFR-007 | WP01–WP08 (all) |
| NFR-008 | WP07 |
| NFR-009 | WP01–WP08 (all) |
| C-001 | WP05 |
| C-002 | WP05, WP06 |
| C-003 | WP03 |
| C-004 | WP05, WP06 |
| C-005 | WP03, WP05 |
| SC-001 | WP01 |
| SC-002 | WP01 |
| SC-003 | WP02 |
| SC-004 | WP03, WP04 |
| SC-005 | WP05 |
| SC-006 | WP06 |
| SC-007 | WP01–WP08 (all) |
| SC-008 | WP08 |
Subtask Index (Reference)
| Subtask ID | Summary | Work Package | Priority | Parallel? |
|---|---|---|---|---|
| T001 | Write failing test for validate-only mutation | WP01 | P0 | No |
| T002 | Guard write_frontmatter with not validate_only | WP01 | P0 | No |
| T003 | Update JSON output contract for validate-only | WP01 | P0 | No |
| T004 | Document bootstrap mutation surface | WP01 | P0 | Yes |
| T005 | Write header depth regression tests | WP02 | P0 | No |
| T006 | Fix feature.py regex to #{2,4} | WP02 | P0 | Yes |
| T007 | Fix emit.py regex to #{2,4} | WP02 | P0 | Yes |
| T008 | Fix tasks.py regex + Boy Scout emit.py cleanup | WP02 | P0 | Yes |
| T009 | Create WPMetadata Pydantic model | WP03 | P1 | No |
| T010 | Implement read_wp_frontmatter() loader | WP03 | P1 | No |
| T011 | Add field validators | WP03 | P1 | No |
| T012 | Write WPMetadata unit tests | WP03 | P1 | Yes |
| T013 | Write CI validation test for kitty-specs/ WP files | WP03 | P1 | Yes |
| T014 | Migrate dependency_graph.py | WP04 | P1 | No |
| T015 | Migrate feature.py WP reads | WP04 | P1 | No |
| T016 | Migrate task_profile, acceptance, bootstrap | WP04 | P1 | No |
| T017 | Migrate dashboard/scanner.py + Boy Scout | WP04 | P1 | No |
| T018 | Migrate requirement_mapping + remaining + Boy Scout | WP04 | P1 | No |
| T019 | Tighten extra=allow to extra=forbid | WP04 | P1 | No |
| T020 | Verify dashboard API endpoint JSON identity | WP04 | P1 | No |
| T021 | Write ADR for State Pattern design decision | WP05 | P1 | No |
| T022 | Create TransitionContext dataclass | WP05 | P1 | Yes |
| T023 | Create WPState ABC | WP05 | P1 | No |
| T024 | Implement 9 concrete state classes (incl. InReviewState) + factory | WP05 | P1 | No |
| T025 | Property tests: transition matrix equivalence (9 lanes) | WP05 | P1 | No |
| T026 | Property tests: guard equivalence | WP05 | P1 | No |
| T027 | TransitionContext tests + Boy Scout transitions.py | WP05 | P1 | Yes |
| T028 | Migrate orchestrator_api/commands.py + Boy Scout | WP06 | P2 | Yes |
| T029 | Migrate next/decision.py | WP06 | P2 | Yes |
| T030 | Migrate dashboard/scanner.py to WPState | WP06 | P2 | Yes |
| T031 | Deduplicate LANES tuples | WP06 | P2 | Yes |
| T032 | Remove stale 4-lane tuple in task_helpers.py | WP06 | P2 | Yes |
| T033 | Verify dashboard kanban bucketing identity | WP06 | P2 | No |
| T034 | Add fast-tests-status CI job | WP07 | P2 | Yes |
| T035 | Add integration-tests-status CI job | WP07 | P2 | Yes |
| T036 | Update existing CI jobs to --ignore status paths | WP07 | P2 | No |
| T037 | Validate CI jobs run correctly | WP07 | P2 | No |
| T038 | Create dashboard/api_types.py with TypedDicts | WP08 | P2 | No |
| T039 | Migrate handlers/features.py to TypedDict types | WP08 | P2 | No |
| T040 | Migrate handlers/api.py + Boy Scout path rename | WP08 | P2 | No |
| T041 | Write JS ↔ Python contract test | WP08 | P2 | No |
| T042 | Boy Scout JS: unused vars, isNaN, RegExp | WP08 | P2 | Yes |
| T043 | Boy Scout JS: optional chaining, Promise rejection | WP08 | P2 | Yes |
| T044 | Run mypy on handler files, verify pass | WP08 | P2 | No |
> This file is the high-level checklist. See individual prompt files in tasks/ for detailed implementation guidance per work package.