Implementation Plan: Tasks And Lane Stabilization
Branch: main | Date: 2026-04-06 | Spec: spec.md Input: Feature specification from kitty-specs/065-tasks-and-lane-stabilization/spec.md
Summary
Fix six confirmed bugs in the planning/tasks control plane so that finalize-tasks preserves dependency intent, --validate-only is genuinely non-mutating, lane computation is complete and explains collapse, mark-status handles both task formats, and generated command guidance includes required flags. All changes target existing Python runtime code, templates, and tests.
Technical Context
Language/Version: Python 3.11+ Primary Dependencies: typer (CLI), rich (console), ruamel.yaml (YAML), pathlib (filesystem) Storage: Filesystem only (YAML frontmatter, JSONL event logs, JSON manifests) Testing: pytest with 90%+ coverage on modified code, mypy --strict Target Platform: CLI tool (macOS, Linux) Project Type: Single project (existing spec-kitty codebase) Performance Goals: finalize-tasks completes in under 5 seconds for features with up to 20 WPs Constraints: Both agent mission finalize-tasks and agent tasks finalize-tasks must exhibit identical behavior (C-004). No regressions for mid-implementation features (C-005).
Charter Check
Testing: pytest with 90%+ coverage for new code. Integration tests for CLI commands. Type checking: mypy --strict, zero errors on changed files. Directives:
- DIRECTIVE_003 (Decision Documentation): All fix decisions documented in research.md with rationale and alternatives.
- DIRECTIVE_010 (Specification Fidelity): Implementation must match spec FR-001 through FR-015 and FR-010a/b.
No violations. Charter gates pass.
Project Structure
Documentation (this feature)
kitty-specs/065-tasks-and-lane-stabilization/
├── spec.md # Feature specification
├── plan.md # This file
├── research.md # Root cause analysis and decisions
└── checklists/
└── requirements.md # Quality checklist
Source Code (repository root)
src/specify_cli/
├── cli/commands/agent/
│ ├── mission.py # WP01: finalize_tasks (primary entry point)
│ ├── tasks.py # WP01: finalize_tasks (legacy entry point), WP04: mark_status
│ └── context.py # WP05: context resolve
├── lanes/
│ ├── compute.py # WP02+WP03: compute_lanes, _UnionFind, surface heuristics
│ └── models.py # WP02+WP03: ExecutionLane, LanesManifest, CollapseReport (new)
├── ownership/
│ ├── inference.py # WP02: infer_owned_files, src/** fallback
│ └── validation.py # WP02: _globs_overlap, validate_no_overlap
├── core/
│ └── paths.py # WP05: require_explicit_feature
├── context/
│ ├── middleware.py # WP05: --feature → --mission error fix
│ └── store.py # WP05: --feature → --mission error fix
├── shims/
│ └── generator.py # WP05: shim template --mission hint
├── frontmatter.py # WP01: FrontmatterManager (correct API)
├── tasks_support.py # WP01: set_scalar (broken for lists)
└── missions/software-dev/command-templates/
└── tasks.md # WP05: context resolve example, task format instructions
tests/
├── specify_cli/cli/commands/agent/
│ ├── test_feature_finalize_bootstrap.py # WP01: finalize + validate-only tests
│ └── test_tasks_canonical_cleanup.py # WP01: tasks.py finalize tests
├── tasks/
│ └── test_finalize_tasks_json_output_unit.py # WP01: JSON output schema tests
├── core/
│ ├── test_dependency_parser.py # WP01: shared parser tests
│ └── test_paths.py # WP05: require_explicit_feature tests
├── lanes/
│ ├── test_compute.py # WP03: lane computation tests
│ ├── test_models.py # WP03: model serialization tests
│ └── test_collapse_report.py # WP03: collapse report tests
├── ownership/
│ ├── test_inference.py # WP02: ownership inference tests
│ └── test_validation.py # WP02: glob validation tests
├── context/
│ ├── test_middleware.py # WP05: middleware error message tests
│ └── test_store.py # WP05: store error message tests
├── shims/
│ └── test_generator.py # WP05: shim template tests
└── git_ops/
├── test_atomic_status_commits_unit.py # WP04: mark-status tests
└── test_mark_status_pipe_table.py # WP04: pipe-table mutation tests
WP01 — Dependency and Frontmatter Truth
Issues: #406, #417 Files:
src/specify_cli/cli/commands/agent/mission.py(primary)src/specify_cli/cli/commands/agent/tasks.py(legacy)src/specify_cli/frontmatter.py(correct API)src/specify_cli/tasks_support.py(brokenset_scalar)
Change 1: Expand dependency parser to support bullet-list format
In mission.py around lines 1802-1819, add a third pattern after the existing two:
Pattern 3: Bullet-list dependencies under a "Dependencies" heading
Match "### Dependencies" (or "**Dependencies**") followed by
lines starting with "- WP##" (with optional trailing text).
The parser currently has:
- Pattern 1 (line 1802):
Depends on WP01, WP02— keep - Pattern 2 (line 1811):
Dependencies: WP01, WP02— keep - Pattern 3 (new): Bullet-list under a Dependencies heading
Implementation approach: After extracting section_content for each WP, scan for a line matching ^#{1,4}\s\?\?Dependencies\?\?\s$ (case-insensitive). If found, collect all subsequent lines matching ^\s[-]\s*(WP\d{2}) until the next heading or blank line.
Change 2: Consolidate tasks.py to use the same parser
In tasks.py lines 1665-1692, replace the independent regex with a call to a shared extraction function. Extract the parser from mission.py into a reusable module function (e.g., specify_cli.core.dependency_graph.parse_dependencies_from_tasks_md(content: str) -> dict[str, list[str]]).
Both mission.py and tasks.py call this single function. This satisfies C-004 (identical behavior).
Change 3: Disagree-loud on non-empty conflict (FR-002a)
After parsing dependencies from tasks.md, before writing frontmatter:
existing_deps = frontmatter.get("dependencies", [])
parsed_deps = dependencies_map.get(wp_id, [])
if existing_deps and parsed_deps and set(existing_deps) != set(parsed_deps):
# Non-empty disagreement: fail with diagnostic
errors.append(f"{wp_id}: frontmatter has {existing_deps}, "
f"tasks.md parsed {parsed_deps}")
If errors are non-empty, finalize-tasks fails before writing any files.
If parsed is empty and existing is non-empty: preserve existing (FR-002). If parsed is non-empty and existing is empty: write parsed. If both agree: write (idempotent).
Change 4: Fix set_scalar type mismatch in tasks.py
At tasks.py:1716, replace:
updated_front = set_scalar(frontmatter, "dependencies", deps)
with the FrontmatterManager API or direct dict assignment + write_frontmatter():
fm_dict, body = read_frontmatter(wp_file)
fm_dict["dependencies"] = deps
write_frontmatter(wp_file, fm_dict, body)
Change 5: Gate all mutations on validate_only (FR-004)
In mission.py: Move the validate_only check BEFORE the frontmatter write loop (before line 1419). When validate_only=True, accumulate would-be changes in a report dict but do not call write_frontmatter().
In tasks.py: Wrap wp_file.write_text() at line 1720 in if not validate_only:. Still compute the dependency map and report, but don't write.
Change 6: Accurate mutation reporting (FR-003)
In both implementations, track:
modified_wps: WPs whose frontmatter actually changedunchanged_wps: WPs whose frontmatter was already correctpreserved_wps: WPs where existing non-empty deps were kept (parsed was empty)
Report all three in JSON output.
Tests
test_bullet_list_dependency_parsing: tasks.md with bullet-list deps parsed correctlytest_inline_dependency_parsing_preserved: existing inline format still workstest_non_empty_disagreement_fails: frontmatter says [WP01], parser says [WP02] → errortest_empty_parse_preserves_existing: parser finds nothing, existing [WP01] kepttest_validate_only_no_file_writes: assert files byte-identical before/aftertest_validate_only_reports_would_be_changes: JSON output includes mutation reporttest_set_scalar_not_used_for_lists: verify FrontmatterManager API usedtest_both_entry_points_identical: same input → same output from both commands
WP02 — Lane Materialization Correctness
Issues: #422 (completeness half) Files:
src/specify_cli/lanes/compute.pysrc/specify_cli/ownership/inference.pysrc/specify_cli/ownership/validation.py
Change 1: Assert all executable WPs are lane-assigned (FR-006)
In compute.py after line 219 (after raw_groups = uf.groups()), add:
# Verify every executable WP appears in a group
assigned_wps = set()
for group in raw_groups.values():
assigned_wps.update(group)
missing = set(code_wp_ids) - assigned_wps
if missing:
raise LaneComputationError(
f"Executable WPs not assigned to any lane: {sorted(missing)}. "
f"Check dependency_graph keys and ownership_manifests."
)
Change 2: Diagnostic for planning-artifact exclusions (FR-006)
Add a planning_artifact_wps field to LanesManifest or return it alongside:
planning_wps = [wp_id for wp_id in all_wp_ids
if ownership_manifests.get(wp_id, None)
and ownership_manifests[wp_id].execution_mode == ExecutionMode.PLANNING_ARTIFACT]
Include in JSON output so operators can verify exclusions.
Change 3: Fail on unassignable executable WPs (FR-007)
In compute.py lines 177-181, when a WP has no ownership manifest AND is not a planning artifact, fail:
manifest = ownership_manifests.get(wp_id)
if manifest and manifest.execution_mode == ExecutionMode.PLANNING_ARTIFACT:
continue
if not manifest:
raise LaneComputationError(
f"Executable WP {wp_id} has no ownership manifest. "
f"Ensure owned_files and execution_mode are set in frontmatter."
)
code_wp_ids.append(wp_id)
Change 4: Warn on zero-match owned_files globs (FR-014)
In ownership/validation.py, add a new function:
def validate_glob_matches(manifests: dict[str, OwnershipManifest],
repo_root: Path) -> list[str]:
"""Warn when owned_files globs match zero files."""
warnings = []
for wp_id, manifest in manifests.items():
for glob_pattern in manifest.owned_files:
matches = list(repo_root.glob(glob_pattern))
if not matches:
warnings.append(
f"{wp_id}: owned_files glob '{glob_pattern}' "
f"matches zero files in repository"
)
return warnings
Call this during finalize-tasks and include warnings in JSON output.
Change 5: Warn on src/** fallback (FR-015)
In ownership/inference.py line 135-136, add a warning return:
if not globs:
globs = ["src/**"]
# Return alongside a warning flag that callers can surface
Modify infer_owned_files to return (globs, warnings) tuple, or add the warning to a module-level warning collector. The finalize-tasks caller surfaces this in its output.
Tests
test_all_executable_wps_in_lanes: N WPs in, N WPs appear in lanes.jsontest_planning_artifact_excluded_with_diagnostic: planning WP excluded, listed in summarytest_missing_manifest_fails_diagnostically: WP without manifest → error with WP nametest_zero_match_glob_warning: owned_files pointing to nonexistent path → warningtest_src_fallback_warning: WP with no paths → warning about synthetic scope
WP03 — Realistic Parallelism Preservation
Issues: #423 Files:
src/specify_cli/lanes/compute.pysrc/specify_cli/lanes/models.py
Change 1: Add CollapseReport data model
In models.py, add:
@dataclass(frozen=True)
class CollapseEvent:
wp_a: str
wp_b: str
rule: str # "dependency", "write_scope_overlap", "surface_heuristic"
evidence: str # e.g., "src/core/** overlaps src/core/utils/**"
@dataclass
class CollapseReport:
events: list[CollapseEvent]
independent_wps_collapsed: int # count of WPs independent in dep graph but same lane
Change 2: Record collapse events during union-find
In compute.py, modify the three union rules to record events:
collapse_events: list[CollapseEvent] = []
# Rule 1: Dependencies
for wp_id in code_wp_ids:
for dep in dependency_graph.get(wp_id, []):
if dep in uf._parent:
if uf.find(wp_id) != uf.find(dep):
collapse_events.append(CollapseEvent(
wp_a=wp_id, wp_b=dep,
rule="dependency",
evidence=f"{wp_id} depends on {dep}"
))
uf.union(wp_id, dep)
# Rule 2: Write-scope overlap
for wp_a, wp_b in find_overlap_pairs(code_manifests):
if uf.find(wp_a) != uf.find(wp_b):
# Find the specific overlapping globs for evidence
overlap_detail = _find_overlap_detail(code_manifests[wp_a], code_manifests[wp_b])
collapse_events.append(CollapseEvent(
wp_a=wp_a, wp_b=wp_b,
rule="write_scope_overlap",
evidence=overlap_detail
))
uf.union(wp_a, wp_b)
# Rule 3: Surface heuristic (refined — see Change 3)
Change 3: Refine Rule 3 — gate on non-disjoint ownership (FR-009)
Current Rule 3 (compute.py:204-216) unions any two WPs that share a surface keyword, regardless of ownership. Change to:
# Rule 3: Shared surfaces → same lane ONLY if ownership is not provably disjoint
for wp_a, wp_b in combinations(code_wp_ids, 2):
surfaces_a = set(wp_surfaces.get(wp_a, []))
surfaces_b = set(wp_surfaces.get(wp_b, []))
if surfaces_a & surfaces_b:
# Only merge if their owned_files actually overlap
if (wp_a in code_manifests and wp_b in code_manifests
and not _globs_are_disjoint(code_manifests[wp_a], code_manifests[wp_b])):
if uf.find(wp_a) != uf.find(wp_b):
shared = surfaces_a & surfaces_b
collapse_events.append(CollapseEvent(
wp_a=wp_a, wp_b=wp_b,
rule="surface_heuristic",
evidence=f"shared surfaces {shared} with non-disjoint ownership"
))
uf.union(wp_a, wp_b)
Add helper _globs_are_disjoint(manifest_a, manifest_b) -> bool that returns True when no glob pair from A overlaps with any glob from B (inverse of find_overlap_pairs for a single pair).
Change 4: Return CollapseReport from compute_lanes
Add collapse_report field to LanesManifest, or return a tuple (LanesManifest, CollapseReport). Include in JSON output from finalize-tasks.
Change 5: Count independent-WP collapses
After computing collapse events, cross-reference with dependency_graph to identify events where the two WPs have no direct or transitive dependency. Report this count in CollapseReport.independent_wps_collapsed.
Tests
test_disjoint_ownership_preserves_parallelism: WP A ownssrc/a/, WP B ownssrc/b/, same surface keyword → two lanes (not one)test_overlapping_ownership_with_surface_collapses: WP A ownssrc/core/, WP B ownssrc/core/utils/, same surface → one lanetest_collapse_report_includes_rule_and_evidence: collapse events contain rule name and specific overlap/surfacetest_collapse_report_counts_independent_collapses: independent WPs forced into same lane countedtest_dependency_collapse_not_reported_as_independent: dependent WPs in same lane → not flagged as independent collapsetest_existing_lane_assignments_unchanged: features with valid lanes.json still compute identically (regression)
WP04 — Mutable Task-State Compatibility
Issues: #438 Files:
src/specify_cli/cli/commands/agent/tasks.py(mark_status function)src/specify_cli/missions/software-dev/command-templates/tasks.md(generation instructions)
Change 1: Add pipe-table row parser to mark-status
In tasks.py around line 1340-1352, extend the search loop:
for task_id in task_ids:
task_found = False
for i, line in enumerate(lines):
# Strategy 1: Checkbox format (existing)
if re.search(rf'-\s*\[[ x]\]\s*{re.escape(task_id)}\b', line):
lines[i] = re.sub(r'-\s*\[[ x]\]', f'- {new_checkbox}', line)
updated_tasks.append(task_id)
task_found = True
break
# Strategy 2: Pipe-table format (new)
if _is_pipe_table_task_row(line, task_id):
lines[i] = _update_pipe_table_status(line, status)
updated_tasks.append(task_id)
task_found = True
break
Helper functions:
def _is_pipe_table_task_row(line: str, task_id: str) -> bool:
"""Match a pipe-delimited row containing the task ID."""
return bool(re.search(
rf'\|\s*{re.escape(task_id)}\s*\|', line
))
def _update_pipe_table_status(line: str, status: str) -> str:
"""Replace status marker in a pipe-table row.
Recognizes [P] (planned/pending), [D] (done), [x] (done),
[ ] (pending) in any pipe-delimited cell.
"""
if status == "done":
return re.sub(r'\[\s*[P ]\s*\]', '[D]', line)
else:
return re.sub(r'\[\s*[Dx]\s*\]', '[P]', line)
Change 2: Standardize future generation to checkbox format (FR-010a)
In missions/software-dev/command-templates/tasks.md, ensure the task-format instructions explicitly tell LLMs to generate checkbox format:
### Task Tracking Format
Use checkbox format for all task rows in tasks.md:
- □ T001 Description of task (WP01)
- □ T002 Another task (WP01)
Do NOT use pipe-table format for task tracking rows.
If any template currently shows pipe-table examples as the primary format, replace with checkbox examples.
Tests
test_mark_status_checkbox_done:- [ ] T001→- [x] T001test_mark_status_checkbox_pending:- [x] T001→- [ ] T001test_mark_status_pipe_table_done:| T001 | desc | WP01 | [P] |→| T001 | desc | WP01 | [D] |test_mark_status_pipe_table_pending:| T001 | desc | WP01 | [D] |→| T001 | desc | WP01 | [P] |test_mark_status_mixed_format: file has both checkbox and pipe-table rows, both updated correctlytest_legacy_pipe_table_file_editable: existing pipe-table tasks.md from mission 063 can be updatedtest_template_generates_checkbox_format: verify template instructions specify checkbox
WP05 — Command Ergonomics for External Agents
Issues: #434 Files:
src/specify_cli/context/middleware.pysrc/specify_cli/context/store.pysrc/specify_cli/shims/generator.pysrc/specify_cli/missions/software-dev/command-templates/tasks.mdsrc/specify_cli/core/paths.py
Change 1: Fix --feature → --mission in error messages (FR-012)
In middleware.py line 100, change:
"Run `spec-kitty agent context resolve --wp <WP> --feature <feature>` first, "
to:
"Run `spec-kitty agent context resolve --wp <WP> --mission <slug>` first, "
In store.py line 65, same change:
"Run `spec-kitty agent context resolve --wp <WP> --mission <slug>` "
Change 2: Add --mission hint to shim template (FR-011)
In shims/generator.py lines 53-76, add guidance line:
return (
f"<!-- spec-kitty-command-version: {version} -->\n"
"Run this exact command and treat its output as authoritative.\n"
"Do not rediscover context from branches, files, or prompt contents.\n"
"Pass --mission <slug> when operating on a specific mission.\n"
"\n"
f'`spec-kitty agent shim {command} --agent {agent_name} --raw-args "{arg_placeholder}"`\n'
)
Change 3: Fix tasks.md template context resolve example (FR-011)
In missions/software-dev/command-templates/tasks.md around lines 52-58, ensure the example includes --mission:
spec-kitty agent context resolve --action tasks --mission <mission-slug> --json
Audit all other command examples in the template (check-prerequisites, finalize-tasks, mark-status) for missing --mission.
Change 4: Improve require_explicit_feature error message (FR-013)
In core/paths.py lines 333-339, make the example a complete copy-pasteable command:
msg = (
f"Mission slug is required. Provide it explicitly: {flag}\n"
"No auto-detection is performed (branch scanning / env vars removed).\n"
f"{available}"
f"Example:\n spec-kitty agent context resolve --action tasks {flag.split()[0]} {example_slug} --json"
)
Tests
test_middleware_error_uses_mission_flag: error message contains--mission, not--featuretest_store_error_uses_mission_flag: error message contains--mission, not--featuretest_shim_template_mentions_mission: generated shim content includes--missionguidancetest_tasks_template_context_resolve_has_mission: template example includes--mission <slug>test_require_explicit_feature_example_is_complete: error message includes full copy-pasteable commandtest_all_tasks_template_commands_have_mission: grep allspec-kitty agentinvocations in template, assert all include--mission
Dependency Map
WP01 (dependency/frontmatter truth)
↑ no deps — can start immediately
owns: tasks.py (finalize_tasks function), mission.py, dependency_parser.py
WP02 (lane materialization correctness)
↑ no deps — can start immediately
owns: lanes/compute.py, ownership/inference.py, ownership/validation.py
WP03 (parallelism preservation)
↑ depends on WP02 (builds on lane computation changes, shares compute.py)
owns: lanes/models.py, lanes/compute.py (shared with WP02, sequenced by dep)
WP04 (task-state compatibility)
↑ depends on WP01 (shared tasks.py) and WP05 (shared tasks template)
owns: test files only; modifies tasks.py mark_status + tasks template under WP01/WP05 ownership
WP05 (command ergonomics)
↑ no deps — can start immediately
owns: context/middleware.py, context/store.py, shims/generator.py, core/paths.py, tasks template
Parallelism: WP01, WP02, WP05 can run in parallel (Phase 1). WP03 and WP04 can run in parallel with each other in Phase 2 (WP03 after WP02; WP04 after WP01+WP05).
Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Dependency parser change breaks existing inline-format features | Medium | High | Regression tests with both formats; test against real kitty-specs/ artifacts |
| validate_only mutation guard misses an edge case | Low | High | Assert file checksums before/after in tests; test against both entry points |
| Rule 3 refinement changes existing valid lane assignments | Medium | Medium | Run compute_lanes on all existing features and compare output (C-005) |
| Pipe-table status marker regex too broad/narrow | Low | Medium | Test against real 063 tasks.md; cover edge cases in cell formatting |
| Shim regeneration needed after template change | Low | Low | Migration handles shim updates on next spec-kitty upgrade |
Charter Re-Check (Post-Design)
- Testing: Plan includes 30+ specific test cases across 5 WPs. 90%+ coverage target.
- Type checking: All new code must pass mypy --strict.
set_scalartype mismatch is being fixed (Change 4, WP01). - DIRECTIVE_003: All decisions documented in research.md with rationale and alternatives.
- DIRECTIVE_010: Plan maps every FR to specific file changes.
No new charter violations.