Work Packages: WP Metadata & State Type Hardening

Inputs: Design documents from /kitty-specs/065-wp-metadata-state-type-hardening/ Prerequisites: plan.md (required), spec.md (user stories), research.md (decisions), data-model.md (entities)

Tests: Tests are included where the spec or doctrine mandates them (test-first-bug-fixing, ATDD, property equivalence, contract tests). All WPs follow DIRECTIVE_030 (quality gate) and DIRECTIVE_034 (test-first).

Organization: Fine-grained subtasks (Txxx) roll up into work packages (WPxx). Each work package must be independently deliverable and testable.

Prompt Files: Each work package references a matching prompt file in /tasks/ generated by /spec-kitty.tasks. Treat this file as the high-level checklist; keep deep implementation detail inside the prompt files.

Subtask Format: [Txxx] [P?] Description

  • [P] indicates the subtask can proceed in parallel (different files/components).
  • Include precise file paths or modules.

Path Conventions

  • Single project: src/specify_cli/, tests/

Phase 1 — Bug Fix & Foundation


Work Package WP01: Validate-Only Bootstrap Fix (#417) (Priority: P0)

Goal: Guard write_frontmatter() calls in finalize_tasks() with not validate_only; audit and document the full mutation surface; update JSON output contract. Independent Test: git diff is empty after --validate-only invocation against any mission with manually edited WP frontmatter. Prompt: /tasks/WP01-validate-only-bootstrap-fix.md Requirement Refs: FR-001, FR-002, FR-003, SC-001, SC-002 Estimated Prompt Size: ~250 lines

Included Subtasks

  • ✅ T001 Write failing test for --validate-only frontmatter mutation (test-first-bug-fixing procedure)
  • ✅ T002 Guard write_frontmatter() with not validate_only at feature.py:1620
  • ✅ T003 Update JSON output contract — remove bootstrap key from --validate-only response
  • ✅ T004 Document bootstrap mutation surface (developer note listing all 8 fields with source/condition)

Implementation Notes

  • Follow test-first-bug-fixing.procedure.yaml: understand bug → choose test level → write failing test → verify fails for right reason → fix → full suite → commit together.
  • The fix is a one-line guard at feature.py:1620. The test and documentation are the bulk of the work.
  • The mutation surface document (T004) satisfies FR-003 and SC-002.

Parallel Opportunities

  • T001 and T004 touch different files and can be started in parallel (test vs docs).

Dependencies

  • None (starting package).

Risks & Mitigations

  • The mutation loop may have changed since research.md was written; verify exact line numbers before patching.

Work Package WP02: tasks.md Header Regex Standardization (#410) (Priority: P0)

Goal: Update 5 regex sites across 3 files to accept ##, ###, and #### WP headers; add regression tests for each depth. Independent Test: finalize-tasks correctly parses dependencies from a tasks.md using #### WP01 headers. Prompt: /tasks/WP02-header-regex-standardization.md Requirement Refs: FR-004, SC-003 Estimated Prompt Size: ~280 lines

Included Subtasks

  • ✅ T005 Write regression tests for ##, ###, #### header depths across all 5 regex sites
  • ✅ T006 Fix _parse_wp_sections_from_tasks_md() in feature.py:1953 to use #{2,4}
  • ✅ T007 Fix _infer_subtasks_complete() in emit.py:148,151 to use #{2,4}
  • ✅ T008 Fix subtask checker in tasks.py:305,310 + Boy Scout: remove unused repo_root param at emit.py:441

Implementation Notes

  • Standardized patterns from research.md Finding 2:
  • WP header match: ^#{2,4}\s+(?:Work Package\s+)?(WP\d{2})(?:\b|:)
  • Section boundary: ^#{2,4}.*\b{wp_id}\b
  • Section end: ^#{2,4}\s+
  • Boy Scout (DIRECTIVE_025): remove unused repo_root param in emit.py:441 while touching that file.

Parallel Opportunities

  • T006, T007, T008 touch different files and can proceed in parallel after T005 writes the test harness.

Dependencies

  • Depends on WP01 (shared feature.py file; WP01 must merge first to avoid conflicts).

Risks & Mitigations

  • Overly permissive regex could match unintended headings; #{2,4} explicitly excludes # (h1) and #####+ (h5+).

Phase 2 — Typed Domain Models


Work Package WP03: WPMetadata Pydantic Model (#410) (Priority: P1)

Goal: Create WPMetadata Pydantic v2 model with typed fields, field validators, frozen=True, extra="allow", and a read_wp_frontmatter() loader. Add CI validation test. Independent Test: All kitty-specs/ WP files pass WPMetadata.model_validate() without modification; ValidationError raised on malformed input. Prompt: /tasks/WP03-wp-metadata-pydantic-model.md Requirement Refs: FR-005, FR-006, SC-004, NFR-004, C-003, C-005 Estimated Prompt Size: ~350 lines

Included Subtasks

  • ✅ T009 Create src/specify_cli/status/wp_metadata.py with WPMetadata model per data-model.md
  • ✅ T010 Implement read_wp_frontmatter() loader function wrapping FrontmatterManager
  • ✅ T011 Add field validators (work_package_id pattern, base_commit hex pattern, title min_length)
  • ✅ T012 [P] Write unit tests for WPMetadata (valid, invalid, unknown extras preserved, round-trip)
  • ✅ T013 [P] Write CI test validating all active kitty-specs/ WP files pass model_validate()

Implementation Notes

  • Use TDD red-green-refactor for model creation (new code, not migration).
  • extra="allow" is critical for backward compatibility — unknown fields must be preserved.
  • Round-trip safety (NFR-004): validate that serializing back to YAML preserves field order and values.
  • Export WPMetadata and read_wp_frontmatter from status/__init__.py.

Parallel Opportunities

  • T012 and T013 are independent test files that can be written in parallel.

Dependencies

  • Depends on WP01 (establishes the baseline; avoids merge conflicts in shared files).

Risks & Mitigations

  • Old WP files may have unexpected field values; CI test (T013) will surface any issues immediately.
  • FrontmatterManager may return types that don't match Pydantic expectations; loader must handle type coercion.

Work Package WP04: WPMetadata Consumer Migration + extra=forbid (#410) (Priority: P1)

Goal: Migrate all consumer modules from frontmatter.get("...") to wp_meta.<field> access. After all consumers migrated, tighten to extra="forbid". Independent Test: grep -r 'frontmatter\.get\|\.get("work_package_id' src/specify_cli/ returns no matches outside frontmatter.py; dashboard endpoints return identical JSON. Prompt: /tasks/WP04-wp-metadata-consumer-migration.md Requirement Refs: FR-007, FR-008, SC-004, NFR-001, NFR-006 Estimated Prompt Size: ~500 lines

Included Subtasks

  • ✅ T014 Migrate dependency_graph.py to use WPMetadata
  • ✅ T015 Migrate feature.py WP frontmatter reads to use WPMetadata
  • ✅ T016 Migrate task_profile.py, acceptance.py, status/bootstrap.py to use WPMetadata
  • ✅ T017 Migrate dashboard/scanner.py to use WPMetadata + Boy Scout: remove unused features param
  • ✅ T018 Migrate requirement_mapping.py + remaining consumers + Boy Scout: extract duplicated regex
  • ✅ T019 Tighten extra="allow"extra="forbid" + update CI test
  • ✅ T020 Verify dashboard API endpoints produce identical JSON before/after (NFR-006)

Implementation Notes

  • Follow Strangler Fig tactic: one consumer per commit, extra="allow" coexists with legacy during migration.
  • Quality gate verification after each commit (run focused tests + mypy).
  • Boy Scout (DIRECTIVE_025): extract duplicated regex in requirement_mapping.py:88; remove unused features param in scanner.py:278; remove unused use_legacy in acceptance.py:448.
  • T020 is the final dashboard operability validation before tightening.

Parallel Opportunities

  • T014, T015, T016, T017, T018 touch different files and can be parallelized in principle, but the Strangler Fig tactic recommends sequential one-per-commit migration for safety.

Dependencies

  • Depends on WP03 (WPMetadata model must exist before consumers can use it).

Risks & Mitigations

  • Consumers may use frontmatter fields not yet in WPMetadata; the CI test from WP03 (T013) catches this.
  • extra="forbid" may break if any WP file has truly unknown fields; defer tightening until all files pass.

Work Package WP05: WPState ABC + TransitionContext + Property Tests (#405) (Priority: P1)

Goal: Create WPState ABC with 9 concrete lane state classes (including promoted InReviewState), TransitionContext dataclass (with review_result field), factory function, property test harness proving transition equivalence, and documentation updates to 9-lane model. Independent Test: Property tests pass proving identical transition matrix and guard outcomes vs existing ALLOWED_TRANSITIONS + _run_guard(). Prompt: /tasks/WP05-wp-state-abc-property-tests.md Requirement Refs: FR-009, FR-010, FR-011, FR-012, FR-012a, FR-012b, FR-012c, FR-012d, SC-005, NFR-005, C-001, C-004 Estimated Prompt Size: ~650 lines

Included Subtasks

  • ✅ T021 Write ADR for State Pattern design decision (ABC vs Protocol, DIRECTIVE_003); include in_review promotion rationale, supersede prior review ADR
  • ✅ T022 Create TransitionContext frozen dataclass in status/transition_context.py with review_result: ReviewResult | None field (FR-012c)
  • ✅ T023 Create WPState ABC in status/wp_state.py with interface per data-model.md
  • ✅ T024 Implement 9 concrete lane state classes (including InReviewState) + factory function wp_state_for(); remove in_review from LANE_ALIASES (FR-012a); restrict for_review outbound to {in_review, blocked, canceled}
  • ✅ T025 Write property tests: transition matrix equivalence (all allowed pairs vs ALLOWED_TRANSITIONS for 9 lanes)
  • ✅ T026 Write property tests: guard equivalence (all guarded transitions vs _run_guard(), including in_review guards)
  • ✅ T027 [P] Write TransitionContext unit tests + Boy Scout: extract duplicated error messages in transitions.py
  • ✅ FR-012d: Update 5 documentation files to 9-lane model (README.md, kanban-workflow.md, status-model.md, runtime-and-missions.md, CLAUDE.md)

Implementation Notes

  • ADR (T021) must be committed BEFORE implementation code (DIRECTIVE_003). ADR should include in_review promotion rationale and supersede architecture/2.x/adr/2026-04-03-2-review-approval-and-integration-completion-are-distinct.md.
  • Use ZOMBIES TDD progression for 9 concrete classes: Zero → One → Many → Boundary → Interface → Exception → Simple.
  • Property tests use explicit enumeration (not Hypothesis) over all allowed pairs and guard-relevant TransitionContext fixtures.
  • doing alias resolution: wp_state_for("doing") returns InProgressState. No DoingState class.
  • in_review is a first-class lane (FR-012a): wp_state_for("in_review") returns InReviewState. NOT aliased.
  • for_review becomes a pure queue state: outbound only to {in_review, blocked, canceled} (FR-012a).
  • for_review → in_review requires actor-required guard with conflict detection (FR-012b).
  • All in_review → * transitions require structured ReviewResult in TransitionContext (FR-012c).
  • Export WPState, TransitionContext, wp_state_for, ReviewResult from status/__init__.py.
  • Boy Scout (DIRECTIVE_025): extract 2 duplicated error messages in transitions.py to constants.
  • Documentation updates (FR-012d): 5 files must reflect 9-lane model before for_review.

Parallel Opportunities

  • T022 (TransitionContext) and T023/T024 (WPState + concrete classes) touch different files.
  • T027 (TransitionContext tests) is independent of T025/T026 (WPState property tests).

Dependencies

  • Depends on WP01 (establishes the baseline).

Risks & Mitigations

  • Guard logic in _run_guard() may have subtle conditions not captured in research; property tests will surface discrepancies.
  • WPState instantiation must be < 1 ms (NFR-005); frozen dataclasses are inherently lightweight.

Phase 3 — Consumer Migration


Work Package WP06: WPState Consumer Migration — High-Touch Trio (#405) (Priority: P2)

Goal: Migrate orchestrator_api/commands.py, next/decision.py, and dashboard/scanner.py to use WPState methods. Deduplicate LANES tuples. Independent Test: grep -r 'current_lane ==' src/specify_cli/orchestrator_api src/specify_cli/next src/specify_cli/dashboard returns zero matches. Prompt: /tasks/WP06-wp-state-consumer-migration.md Requirement Refs: FR-013, FR-014, SC-006, NFR-001, NFR-006, C-004 Estimated Prompt Size: ~420 lines

Included Subtasks

  • ✅ T028 Migrate orchestrator_api/commands.py + Boy Scout: handle empty except clause, extract help string constants
  • ✅ T029 Migrate next/decision.py to use WPState methods
  • ✅ T030 Migrate dashboard/scanner.py to use WPState methods
  • ✅ T031 Deduplicate LANES tuples: tasks_support.py + task_helpers.py import from status package
  • ✅ T032 Remove stale 4-lane tuple in scripts/tasks/task_helpers.py
  • ✅ T033 Verify dashboard kanban bucketing produces identical results via WPState.display_category() (NFR-006)

Implementation Notes

  • Follow Strangler Fig tactic: one consumer per commit. Old validate_transition() API preserved for non-migrated consumers (C-004).
  • Key eliminations per plan.md:
  • _RUN_AFFECTING_LANES frozenset → state.affects_run property
  • if current_lane == "planned" / elif "claimed" cascades → state.allowed_targets() / state.progress_bucket()
  • Boy Scout (DIRECTIVE_025): handle empty except at commands.py:135; extract 3 duplicated help strings to constants.
  • T033 is the dashboard operability gate (NFR-006).

Parallel Opportunities

  • T028, T029, T030 touch different files and can proceed in parallel.
  • T031/T032 (LANES dedup) are independent of the consumer migrations.

Dependencies

  • Depends on WP05 (WPState ABC and property tests must exist before consumer migration).

Risks & Mitigations

  • Consumer migration may reveal guard conditions not covered by WP05 property tests; run full test suite after each migration commit.
  • Old API must remain callable by 40+ non-migrated consumers; never remove validate_transition() or ALLOWED_TRANSITIONS.

Phase 4 — Infrastructure & Cross-Cutting


Work Package WP07: Status Test Suite CI Stage Split (Priority: P2)

Goal: Add fast-tests-status and integration-tests-status CI jobs; update existing jobs to exclude status test paths. Independent Test: New CI jobs run; fast-tests-core no longer executes status tests. Prompt: /tasks/WP07-ci-status-stage-split.md Requirement Refs: NFR-008 Estimated Prompt Size: ~250 lines

Included Subtasks

  • ✅ T034 [P] Add fast-tests-status CI job to ci-quality.yml (runs tests/status/ + tests/specify_cli/status/)
  • ✅ T035 [P] Add integration-tests-status CI job (needs: fast-tests-status + fast-tests-core)
  • ✅ T036 Update fast-tests-core and integration-tests-core to --ignore status paths
  • ✅ T037 Validate CI: push branch, verify new jobs appear and fast-tests-core excludes status tests

Implementation Notes

`` kernel-tests ├── fast-tests-doctrine (unchanged) ├── fast-tests-status (NEW) ├── fast-tests-core (modified: --ignore) │ ├── integration-tests-doctrine (unchanged) │ ├── integration-tests-status (NEW) │ └── integration-tests-core (modified: --ignore) ``

  • Target CI graph from research.md Finding 4:
  • This WP is independent of all others and can be implemented at any point.

Parallel Opportunities

  • T034 and T035 define new jobs; T036 modifies existing jobs. All changes are in one file (ci-quality.yml).
  • The entire WP can run in parallel with any other WP.

Dependencies

  • None (independent of all other WPs).

Risks & Mitigations

  • CI job names must be unique; verify no collisions with existing job matrix entries.
  • --ignore paths must match exactly; test locally with pytest --collect-only --ignore=... before pushing.

Work Package WP08: Dashboard API TypedDict Contracts (#361 Phase 1) (Priority: P2)

Goal: Define TypedDict response shapes for all JSON dashboard endpoints; migrate handlers to construct responses through these types; write JS ↔ Python contract test; apply Boy Scout JS fixes. Independent Test: mypy passes on handler files; contract test validates JS frontend references the same keys as Python TypedDict definitions. Prompt: /tasks/WP08-dashboard-api-typed-contracts.md Requirement Refs: FR-015, FR-016, SC-008, NFR-006, NFR-007 Estimated Prompt Size: ~480 lines

Included Subtasks

  • ✅ T038 Create src/specify_cli/dashboard/api_types.py with TypedDict definitions per data-model.md
  • ✅ T039 Migrate dashboard/handlers/features.py to construct responses through TypedDict types
  • ✅ T040 Migrate dashboard/handlers/api.py to construct responses through TypedDict types + Boy Scout: rename reassigned path variable
  • ✅ T041 Write pytest contract test (test_api_contract.py) validating JS ↔ Python key alignment
  • ✅ T042 [P] Boy Scout JS fixes: remove unused artifactKey, .find().some(), isNaN()Number.isNaN(), RegExp.exec()
  • ✅ T043 [P] Boy Scout JS fixes: optional chaining (9 sites), Promise rejection with Error (8 sites)
  • ✅ T044 Run mypy on handler files, verify pass

Implementation Notes

  • TypedDict definitions from data-model.md: ArtifactInfo, KanbanStats, KanbanTaskData, KanbanResponse, HealthResponse, ResearchResponse, ArtifactDirectoryResponse.
  • FeaturesListResponse is the largest shape (~15 keys); finalize during implementation based on post-migration handler output.
  • Contract test approach: parse dashboard.js for .key and ["key"] property accesses on fetch responses; compare against TypedDict __annotations__.
  • Boy Scout (DIRECTIVE_025): all JS fixes from Tier 1 and Tier 2 SonarCloud items assigned to WP08.

Parallel Opportunities

  • T042 and T043 (JS fixes) are independent of T038-T041 (Python TypedDict work) and can proceed in parallel.

Dependencies

  • Depends on WP04 (WPMetadata migration in scanner.py) and WP06 (WPState migration in scanner.py). Both must complete before dashboard TypedDict work can safely assess final response shapes.

Risks & Mitigations

  • JS ↔ Python contract test may be fragile if JS uses dynamic property access; limit test scope to statically analyzable accesses.
  • Dashboard handler response shapes may be more complex than data-model.md estimates; FeaturesListResponse may need expansion.

Dependency & Execution Summary

  • Sequence: WP01 → {WP02, WP03, WP05} → {WP04, WP06} → WP08
  • Independent: WP07 can run at any time, in parallel with everything.
  • Parallelization: After WP01 merges, WP02/WP03/WP05 can run concurrently. After WP03→WP04 and WP05→WP06 complete, WP08 can start.
  • MVP Scope: WP01 + WP02 (bug fixes, immediate user value).
WP01 (validate-only fix)
├── WP02 (regex fix)              ← independent of WP03+
├── WP03 (WPMetadata model)
│   └── WP04 (consumer migration + extra=forbid)
│       └── WP08 (dashboard API TypedDict contracts)  ← also depends on WP06
└── WP05 (WPState + TransitionContext + property tests)
    └── WP06 (consumer migration: orchestrator_api, next/decision, dashboard)
        └── WP08 (dashboard API TypedDict contracts)
WP07 (CI stage split)             ← independent of all above

Requirements Coverage Summary

Requirement IDCovered By Work Package(s)
FR-001WP01
FR-002WP01
FR-003WP01
FR-004WP02
FR-005WP03
FR-006WP03
FR-007WP04
FR-008WP04
FR-009WP05
FR-010WP05
FR-011WP05
FR-012WP05
FR-012aWP05
FR-012bWP05
FR-012cWP05
FR-012dWP05
FR-013WP06
FR-014WP06
FR-015WP08
FR-016WP08
NFR-001WP01–WP08 (all)
NFR-002WP05, WP06
NFR-003WP05, WP06
NFR-004WP03
NFR-005WP05
NFR-006WP04, WP06, WP08
NFR-007WP01–WP08 (all)
NFR-008WP07
NFR-009WP01–WP08 (all)
C-001WP05
C-002WP05, WP06
C-003WP03
C-004WP05, WP06
C-005WP03, WP05
SC-001WP01
SC-002WP01
SC-003WP02
SC-004WP03, WP04
SC-005WP05
SC-006WP06
SC-007WP01–WP08 (all)
SC-008WP08

Subtask Index (Reference)

Subtask IDSummaryWork PackagePriorityParallel?
T001Write failing test for validate-only mutationWP01P0No
T002Guard write_frontmatter with not validate_onlyWP01P0No
T003Update JSON output contract for validate-onlyWP01P0No
T004Document bootstrap mutation surfaceWP01P0Yes
T005Write header depth regression testsWP02P0No
T006Fix feature.py regex to #{2,4}WP02P0Yes
T007Fix emit.py regex to #{2,4}WP02P0Yes
T008Fix tasks.py regex + Boy Scout emit.py cleanupWP02P0Yes
T009Create WPMetadata Pydantic modelWP03P1No
T010Implement read_wp_frontmatter() loaderWP03P1No
T011Add field validatorsWP03P1No
T012Write WPMetadata unit testsWP03P1Yes
T013Write CI validation test for kitty-specs/ WP filesWP03P1Yes
T014Migrate dependency_graph.pyWP04P1No
T015Migrate feature.py WP readsWP04P1No
T016Migrate task_profile, acceptance, bootstrapWP04P1No
T017Migrate dashboard/scanner.py + Boy ScoutWP04P1No
T018Migrate requirement_mapping + remaining + Boy ScoutWP04P1No
T019Tighten extra=allow to extra=forbidWP04P1No
T020Verify dashboard API endpoint JSON identityWP04P1No
T021Write ADR for State Pattern design decisionWP05P1No
T022Create TransitionContext dataclassWP05P1Yes
T023Create WPState ABCWP05P1No
T024Implement 9 concrete state classes (incl. InReviewState) + factoryWP05P1No
T025Property tests: transition matrix equivalence (9 lanes)WP05P1No
T026Property tests: guard equivalenceWP05P1No
T027TransitionContext tests + Boy Scout transitions.pyWP05P1Yes
T028Migrate orchestrator_api/commands.py + Boy ScoutWP06P2Yes
T029Migrate next/decision.pyWP06P2Yes
T030Migrate dashboard/scanner.py to WPStateWP06P2Yes
T031Deduplicate LANES tuplesWP06P2Yes
T032Remove stale 4-lane tuple in task_helpers.pyWP06P2Yes
T033Verify dashboard kanban bucketing identityWP06P2No
T034Add fast-tests-status CI jobWP07P2Yes
T035Add integration-tests-status CI jobWP07P2Yes
T036Update existing CI jobs to --ignore status pathsWP07P2No
T037Validate CI jobs run correctlyWP07P2No
T038Create dashboard/api_types.py with TypedDictsWP08P2No
T039Migrate handlers/features.py to TypedDict typesWP08P2No
T040Migrate handlers/api.py + Boy Scout path renameWP08P2No
T041Write JS ↔ Python contract testWP08P2No
T042Boy Scout JS: unused vars, isNaN, RegExpWP08P2Yes
T043Boy Scout JS: optional chaining, Promise rejectionWP08P2Yes
T044Run mypy on handler files, verify passWP08P2No

> This file is the high-level checklist. See individual prompt files in tasks/ for detailed implementation guidance per work package.