Feature Specification: Canonical Context Architecture Cleanup

Feature Branch: 057-canonical-context-architecture-cleanup Created: 2026-03-27 Status: Draft Input: Architectural analysis identifying four root failures in Spec Kitty's current design, with a five-move cleanup plan.

User Scenarios & Testing (mandatory)

User Story 1 - Agent Implements a Work Package Without Context Loss (Priority: P1)

An AI agent receives a work package assignment. The agent resolves context once at the start of work, receives a bound context token, and every subsequent command (review, move-task, mark-status, accept, merge) uses that same token. The agent never re-resolves the feature slug, WP ID, branch, or owned files. If the context token is missing or invalid, the command fails immediately with an actionable error.

Why this priority: Context rediscovery is the single largest source of command failures, slug collisions, and cross-WP contamination. Fixing this absorbs the most bug surface area.

Independent Test: A full implement-through-merge cycle for one WP using only --context <token> on every command. No heuristic resolution triggered at any step.

Acceptance Scenarios:

1. Given an agent calls spec-kitty agent context resolve --wp WP03 --feature 057-canonical-context-architecture-cleanup, When the context is persisted, Then all subsequent commands accepting --context <token> operate on exactly WP03 with no re-resolution. 2. Given a persisted context token referencing WP03, When an agent passes that token to spec-kitty agent tasks move-task --context <token> --to for_review, Then the command moves WP03 without parsing branch names, env vars, or task file paths. 3. Given a command is invoked without --context and without sufficient arguments for unambiguous resolution, When the CLI attempts execution, Then it fails with a clear error directing the user to resolve context first.


User Story 2 - Planning-Artifact WP Operates Without Worktree Confusion (Priority: P1)

A contributor works on a planning-artifact WP (e.g., writing spec.md, plan.md, or tasks.md). The WP's execution_mode is planning_artifact. The system does not create a sparse-checkout worktree. Instead, the contributor works directly on the planning surface in the main repository or in a dedicated planning worktree that includes exactly the owned files. No kitty-specs files are filtered, hidden, or confused by sparse-checkout boundaries.

Why this priority: Forcing planning-artifact WPs through the code-worktree pipeline is the root cause of most kitty-specs visibility failures and sparse-checkout confusion.

Independent Test: Create a feature, finalize tasks with at least one planning_artifact WP and one code_change WP. Implement the planning WP. Verify no sparse checkout is applied and the contributor can see and edit all owned planning files.

Acceptance Scenarios:

1. Given a WP with execution_mode: planning_artifact and owned_files: ["kitty-specs/057-*/spec.md"], When the agent runs spec-kitty implement WP01, Then the system either works in-repo or creates a planning worktree with full access to the owned files, not a sparse-checkout worktree. 2. Given a WP with execution_mode: code_change, When the agent runs spec-kitty implement WP02, Then the system creates a standard worktree as it does today. 3. Given a WP with execution_mode: planning_artifact, When the agent checks the worktree contents, Then all files listed in owned_files and authoritative_surface are present and writable.


User Story 3 - Status Is Read from One Authority (Priority: P1)

A project maintainer checks the kanban board. The board state is computed entirely from the canonical event log (status.events.jsonl). No frontmatter lane fields, no status.json projections, and no task-board caches are consulted as authoritative sources. Mutable status fields (lane, review_status, reviewed_by, progress summaries) do not exist in WP frontmatter. Derived views (status.json, board summaries) are regenerated from the event log on demand.

Why this priority: Multi-authority state is the root cause of drift, merge noise, and contradictory validators. Collapsing to one source eliminates an entire class of bugs.

Independent Test: Emit several status transitions via the event log. Verify that spec-kitty status, the CLI progress output, and any generated status.json all reflect exactly the event log state. Verify that WP frontmatter contains no mutable status fields.

Acceptance Scenarios:

1. Given a feature with 5 WPs and a sequence of status transitions in the event log, When a user runs spec-kitty status, Then the displayed board matches the event log exactly. 2. Given a WP file in tasks/*.md, When a user inspects its YAML frontmatter, Then only static metadata is present (title, dependencies, execution_mode, owned_files, authoritative_surface, wp_id, mission_id). No lane, review_status, reviewed_by, or progress fields exist. 3. Given a status.json file exists, When the event log is modified by a new transition, Then status.json is regenerated from the event log and never read as an input to status determination.


User Story 4 - Merge Completes Deterministically in a Dedicated Workspace (Priority: P2)

A maintainer merges a completed feature. The merge engine creates or reuses a dedicated merge workspace independent of whatever branch is checked out in the main repository. The merge loads the canonical mission context, computes merge tips from WP branches, applies managed conflict resolution for Spec Kitty-owned files, and persists state so an interrupted merge can resume. After merge, done-state is reconciled from actual merged git ancestry.

Why this priority: The current merge depends on checked-out branch state, causing target-branch/worktree collisions. A dedicated workspace makes merge deterministic and resumable.

Independent Test: Start a merge for a feature with 3 WPs. Interrupt it after WP01 merges. Resume and verify WP02/WP03 merge without re-processing WP01. Verify the main repo's checked-out branch was never changed.

Acceptance Scenarios:

1. Given a feature with 3 completed WPs, When the user runs spec-kitty merge --feature 057-canonical-context-architecture-cleanup, Then a dedicated merge workspace is created, WPs are merged in dependency order, and the user's main repo checkout is untouched. 2. Given a merge interrupted after WP01, When the user runs spec-kitty merge --resume, Then the merge continues from WP02 using persisted merge state. 3. Given Spec Kitty-owned files (status artifacts, generated manifests) conflict during merge, When the merge engine encounters these, Then managed conflict resolvers handle them automatically without manual intervention.


User Story 5 - Agent Shims Pass Through to CLI (Priority: P2)

A contributor using Claude Code (or any of the 12 supported agents) invokes /spec-kitty.implement WP03. The agent-specific command file is a thin shim that passes the raw arguments to spec-kitty agent workflow implement --context <token>. No workflow logic, recovery instructions, or argument parsing happens in the markdown command file. All behavior lives in the CLI. Internal-only skills are excluded from consumer installs via an allowlist in the packaged registry.

Why this priority: Copied prompt logic causes argument-semantic drift, stale prompts, and skill drift across 12 agent directories. Thin shims eliminate this entire class of maintenance burden.

Independent Test: Inspect a generated agent command file. Verify it contains only a CLI passthrough call with context argument forwarding. Verify no conditional logic, recovery instructions, or workflow state management is present in the file.

Acceptance Scenarios:

1. Given a project with Claude Code configured, When the user runs spec-kitty upgrade, Then .claude/commands/spec-kitty.implement.md contains only a thin passthrough to the CLI with --context support. 2. Given a project with all 12 agents configured, When the user inspects any agent command file, Then all files follow the same thin-shim pattern with no duplicated workflow logic. 3. Given the packaged skill registry, When a consumer project installs spec-kitty, Then internal-only skills are excluded based on the registry allowlist.


User Story 6 - Legacy Project Migrates to New Architecture (Priority: P1)

A maintainer with an existing Spec Kitty project runs any CLI command. The CLI detects that the project has not been migrated and refuses to operate, directing the user to run spec-kitty upgrade. The upgrade performs a one-shot migration: backfills immutable IDs (project_uuid, mission_id, stable wp_id), infers execution_mode and owned_files from existing artifacts, rebuilds canonical event log state from legacy frontmatter/status files, removes mutable status fields from frontmatter, and rewrites agent command files as thin shims.

Why this priority: Without migration, no existing project can use the new architecture. The migration is a first-class deliverable, not incidental tooling.

Independent Test: Take a legacy 2.1.x project with multiple features in various states. Run spec-kitty upgrade. Verify all immutable IDs are assigned, mutable frontmatter fields are removed, the event log is canonical, and all commands work against the new model.

Acceptance Scenarios:

1. Given a legacy project (pre-057), When a user runs any spec-kitty command, Then the CLI prints "Project requires migration. Run spec-kitty upgrade to continue." and exits with a non-zero code. 2. Given a legacy project, When the user runs spec-kitty upgrade, Then every feature gets a project_uuid, every WP gets a stable wp_id and inferred execution_mode/owned_files, and the canonical event log is rebuilt from legacy state. 3. Given a migrated project, When the user inspects WP frontmatter, Then no mutable status fields (lane, review_status, reviewed_by, progress) remain. Only static metadata is present. 4. Given a legacy project with features in mid-flight (some WPs done, some in progress), When the migration runs, Then the event log faithfully represents the pre-migration state and no status information is lost.


Edge Cases

  • What happens when a legacy project has corrupted or conflicting status across frontmatter and status.json? The migration should use a deterministic precedence order (event log > status.json > frontmatter) and log warnings for conflicts.
  • What happens when a context token references a WP that has been deleted or renumbered? The command should fail with a clear error referencing the stale token.
  • What happens when a planning-artifact WP's owned_files overlap with another WP's owned_files? Task finalization should reject overlapping ownership with an explicit validation error.
  • What happens when a merge workspace already exists from a previous interrupted merge of a different feature? The merge engine should detect the stale workspace, offer to clean it up, and refuse to proceed until resolved.
  • What happens when the upgrade gate detects a project schema version newer than the running CLI? The CLI should refuse to operate and instruct the user to upgrade the CLI, not the project.
  • What happens when a thin shim is invoked but no context token exists? The shim should instruct the agent to run context resolution first, not silently fall back to heuristic resolution.

Requirements (mandatory)

Functional Requirements

IDTitleUser StoryPriorityStatus
FR-001MissionContext objectAs an agent, I want a persisted context object with immutable IDs (project_uuid, mission_id, work_package_id, wp_code, feature_slug, target_branch, authoritative_repo, authoritative_ref (optional for planning_artifact WPs), owned_files, execution_mode, dependency_mode) so that every workflow command operates on bound identity without re-resolution.HighOpen
FR-002Context token CLI interfaceAs an agent, I want every workflow command to accept --context <token> so that I never trigger heuristic slug/branch/WP resolution after initial binding.HighOpen
FR-003Fail-fast on missing contextAs an agent, I want commands to fail immediately with an actionable error when invoked without sufficient context rather than falling back to heuristic resolution.HighOpen
FR-004WP execution_mode declarationAs a project maintainer, I want each WP to declare execution_mode (code_change or planning_artifact) at task-finalization time so that the system uses the correct workspace strategy.HighOpen
FR-005WP owned_files manifestAs a project maintainer, I want each WP to declare owned_files and authoritative_surface at task-finalization time so that file ownership is explicit and enforceable.HighOpen
FR-006Planning-artifact workspaceAs a contributor, I want planning-artifact WPs to work directly on the planning surface or a dedicated planning worktree without sparse checkout so that kitty-specs files are always visible and editable.HighOpen
FR-007Remove sparse checkout from critical pathAs a contributor, I want sparse checkout removed as a policy boundary so that file visibility is determined by the owned_files manifest, not checkout configuration.HighOpen
FR-008Canonical event log authorityAs a maintainer, I want the status event log to be the sole authority for mutable WP state so that no other file is consulted for lane, progress, or review status.HighOpen
FR-009Remove mutable frontmatter fieldsAs a maintainer, I want mutable status fields (lane, review_status, reviewed_by, progress summaries) removed from WP frontmatter and tasks.md so that frontmatter contains only static metadata.HighOpen
FR-010Derived view regenerationAs a maintainer, I want status.json, board summaries, dossier snapshots, and cached manifests to be regenerated from the event log on demand and never treated as merge-critical inputs.HighOpen
FR-011Tracked vs derived boundaryAs a project maintainer, I want the tracked/derived boundary to be explicit: tracked = human-authored mission/task artifacts + canonical event log; derived = status.json, board summaries, dossier snapshots, generated prompt surfaces, cached manifests.HighOpen
FR-012Dedicated merge workspaceAs a maintainer, I want merge to operate in a dedicated workspace independent of the main repo's checked-out branch so that merge is deterministic and does not collide with active work.MediumOpen
FR-013Resumable merge state machineAs a maintainer, I want merge state persisted so that an interrupted merge can resume from where it stopped without re-processing completed WPs.MediumOpen
FR-014Managed conflict resolutionAs a maintainer, I want Spec Kitty-owned files (status artifacts, generated manifests) to be auto-resolved during merge so that only human-authored conflicts require manual intervention.MediumOpen
FR-015Merge-complete reconciliationAs a maintainer, I want done-state reconciled from actual merged git ancestry after merge completes so that the event log reflects reality.MediumOpen
FR-016Thin agent shimsAs a contributor using any of the 12 supported agents, I want agent command files to be thin CLI passthroughs that forward raw arguments to spec-kitty with --context so that no workflow logic is duplicated in agent directories.MediumOpen
FR-017Internal skill allowlistAs a project maintainer, I want internal-only skills excluded from consumer installs via an explicit allowlist in the packaged registry so that end users do not see development-only capabilities.MediumOpen
FR-018One-shot migrationAs a maintainer of an existing project, I want a single spec-kitty upgrade command that backfills immutable IDs, infers execution_mode and owned_files, rebuilds canonical state from legacy artifacts, removes mutable frontmatter fields, and rewrites agent shims.HighOpen
FR-019Upgrade gateAs a maintainer, I want the CLI to refuse to operate on unmigrated projects and direct me to run spec-kitty upgrade so that there is no mixed-mode execution.HighOpen
FR-020Schema-version compatibilityAs a maintainer, I want the upgrade gate based on project schema/capability version rather than exact CLI patch version so that version deadlocks are avoided.HighOpen
FR-021Immutable identity fieldsAs a maintainer, I want project_uuid, mission_id, and stable wp_id assigned as immutable identity so that lexical slug collisions are resolved and identity is never ambiguous.HighOpen
FR-022Weighted progress from canonical modelAs a maintainer, I want weighted progress computed from the canonical lane model in one shared library usable by both CLI and future SaaS consumers so that progress calculation has a single implementation.MediumOpen
FR-023Canonical machine-readable outputsAs a SaaS consumer (future), I want stable canonical machine-readable outputs and schemas exposed by the CLI so that SaaS can adopt the new state model without Spec Kitty changes.MediumOpen

Non-Functional Requirements

IDTitleRequirementCategoryPriorityStatus
NFR-001Migration speedOne-shot migration completes in under 30 seconds for a project with up to 20 features and 200 WPs.PerformanceHighOpen
NFR-002Migration safetyMigration is atomic: if any step fails, the project is left in its pre-migration state with no partial writes.ReliabilityHighOpen
NFR-003Context resolution speedMissionContext resolution and persistence completes in under 500ms.PerformanceMediumOpen
NFR-004Test coverageAll new code achieves 90%+ test coverage with pytest. All public interfaces pass mypy --strict.QualityHighOpen
NFR-005Merge determinismGiven the same set of WP branches and event log, the merge engine produces identical results regardless of the main repo's checked-out branch or filesystem state.ReliabilityHighOpen
NFR-006Backward deletionThe cleanup removes at least the following: branch/slug heuristic detection, sparse-checkout policy enforcement, frontmatter multi-authority logic, move-task contamination heuristics, prompt-specific recovery instructions, git-noise filtering for generated artifacts, agent-template drift from copied commands.MaintainabilityHighOpen

Constraints

IDTitleConstraintCategoryPriorityStatus
C-001Big-bang releaseAll five architectural moves ship as one coordinated release. No intermediate compatibility shims between moves.ProcessHighOpen
C-002No SaaS changesSaaS persistence, Django views, and SaaS rollout timing are out of scope. The CLI produces canonical outputs; SaaS adopts them independently.ScopeHighOpen
C-003Fail-fast on unmigratedThe CLI must refuse to operate on unmigrated projects. No dual-authority or mixed-mode execution after release.TechnicalHighOpen
C-004No dual-authority stateAfter migration, no runtime path may consult both the event log and frontmatter/status.json for mutable state. The event log is the sole authority.TechnicalHighOpen
C-005Schema version gateThe upgrade gate must use project schema/capability version, not exact CLI patch version.TechnicalHighOpen
C-006Legacy as migration input onlyOld slug/branch/env heuristics and legacy state files may exist only as migration inputs, not as first-class runtime behavior.TechnicalHighOpen
C-007Agent shim generation continuesThin shim command files are still generated into agent directories. The change is removing workflow logic from them, not removing the files.TechnicalMediumOpen
C-008Python 3.11+All code targets Python 3.11+ using typer, rich, ruamel.yaml, pytest, mypy (strict).TechnicalHighOpen

Key Entities

  • MissionContext: Persisted bound-identity object containing project_uuid, mission_id, wp_id, feature_slug (display alias), target_branch, authoritative_repo, authoritative_ref, owned_files, execution_mode, dependency_mode, and completion_commands. Created by context resolution, consumed by all workflow commands via token reference.
  • WP Ownership Manifest: Per-WP declaration of execution_mode (code_change | planning_artifact), owned_files (glob patterns), and authoritative_surface (path to the canonical location for this WP's artifacts). Set at task-finalization time. Stored as static WP frontmatter.
  • Canonical Event Log: Append-only JSONL file (status.events.jsonl) per feature. Sole authority for mutable state (lane transitions, review status, progress). All other status representations are derived from this log.
  • Schema Version: Project capability version stored in project metadata. Used by the upgrade gate to determine compatibility. Based on schema capabilities, not CLI patch version.
  • Merge State: Persisted JSON state for resumable merge operations. Tracks feature, WP order, completed WPs, current WP, conflict status, and strategy. Lives in a dedicated merge workspace, not the main repo.
  • Agent Shim: Thin runtime-specific wrapper file installed into agent command directories. Contains only a CLI passthrough call with context forwarding. No workflow logic, recovery instructions, or argument parsing.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: An agent can complete a full WP lifecycle (implement through merge) using only the --context <token> interface, with zero heuristic resolution calls triggered.
  • SC-002: WP frontmatter across all features contains zero mutable status fields after migration. All mutable state is served exclusively from the event log.
  • SC-003: Planning-artifact WPs complete without any sparse-checkout-related failures or file-visibility issues.
  • SC-004: The merge engine produces identical results regardless of the main repo's checked-out branch, verified by running the same merge from different checkout states.
  • SC-005: The one-shot migration successfully converts a legacy project with at least 10 features in mixed states (planned, in-progress, done) with zero status information loss, verified by comparing pre-migration board state to post-migration event-log-derived state.
  • SC-006: Agent command files across all 12 agent directories contain no workflow logic — only CLI passthrough calls — verified by automated content inspection.
  • SC-007: The following code paths are deleted or radically simplified after the release: branch/slug heuristic feature detection, sparse-checkout policy enforcement, frontmatter/status/task-board multi-authority logic, move-task contamination heuristics, prompt-specific recovery instructions, git-noise filtering for generated artifacts.
  • SC-008: The new canonical state model exposes stable machine-readable JSON schemas documented for future SaaS consumption.