Charter Golden-Path E2E (Tranche 1)

FieldValue
Mission ID01KQ806XN4TTJRAQGZWVPQP7V7
Mission slugcharter-golden-path-e2e-tranche-1-01KQ806X
Mission typesoftware-dev
Branch (planning, base, target)test/charter-e2e-827-tranche-1
Primary issuehttps://github.com/Priivacy-ai/spec-kitty/issues/827
Parent epichttps://github.com/Priivacy-ai/spec-kitty/issues/461
Created2026-04-27

Purpose

TLDR. Prove the shipped Charter epic works through public CLI from a fresh project, with no source-checkout pollution.

Context. Issue #827 needs a deterministic operator-path E2E spine before the rest of the P0/P1/P2 matrix can be built on top. This tranche delivers one product-repo test that drives Charter setup, bundle validation, synthesis, status, lint, mission advancement via spec-kitty next, and retrospect summary — all through public commands — and asserts the source checkout is unchanged afterward, so future tranches can extend the spine without fighting hidden helpers.

User Scenarios & Testing

Primary actor

A Spec Kitty maintainer or a CI runner executing the product test suite. They consume the test as evidence that the Charter epic still works the way an external operator uses it.

Primary scenario (happy path)

1. The maintainer / CI runner runs pytest tests/e2e/test_charter_epic_golden_path.py. 2. The test materialises a fresh project at a temp path that lives outside the source checkout, performs git init/config, and seeds it solely via public CLI: spec-kitty init . --ai codex --non-interactive. 3. The test exercises Charter setup as an operator would, in order: charter interview (--profile minimal --defaults --json), charter generate (--from-interview --json), charter bundle validate (--json), charter synthesize (--dry-run --json), charter synthesize (--json), charter status (--json), charter lint (--json). 4. The test scaffolds the smallest deterministic composed mission (preference order: software-devdocumentation → minimal custom mission that still walks the public runtime composition path), then advances it: spec-kitty next --agent test-agent --mission <handle> --json, then spec-kitty next --agent test-agent --mission <handle> --result success --json. 5. The test runs spec-kitty retrospect summary --json against the temp project and parses the returned envelope. 6. The test compares a git status --short snapshot of the source checkout taken before step 2 against one taken after step 5 and asserts they match exactly.

Primary exception (acceptable structured outcome)

If a public CLI step legitimately produces a documented non-error structured block — for example, next --result success reports an intentional missing guard artifact for the chosen mission — the test accepts that outcome and asserts the structured shape, without swallowing real product regressions. Any failure that would only pass by monkeypatching the dispatcher, step contract executor, DRG resolver, or frozen-template loader is reported as a test failure with full diagnostics (see FR-019).

Always-true invariants

  • The test only exercises public CLI surfaces. It does not import or call decide_next_via_runtime, _dispatch_via_composition, StepContractExecutor, run_terminus, apply_proposals, or any other private helper.
  • The temp project lives outside the source checkout for every assertion.
  • After the test finishes (pass or fail), the source checkout has zero new or modified tracked / untracked files in kitty-specs/, .kittify/, .worktrees/, docs/, profile-invocations/, or any other path inside the source repo.
  • The new test is marked @pytest.mark.e2e and @pytest.mark.slow.

Domain Language

Canonical termMeaningAvoid as synonym
Operator pathThe sequence of public spec-kitty CLI commands an external user runs"agent path", "internal flow"
Private helperAny in-process Python function not exposed via the published CLI"internal API", "utility"
Guard artifactA file or state the runtime requires before it will advance an action via next"input artifact", "precondition file"
Lifecycle recordA paired pre/post entry in .kittify/events/profile-invocations/*.jsonl for an issued action"event line", "log entry"
Source checkoutThe repository working tree that contains the test code itself"main repo", "repo root"
Fresh projectA temp directory initialized from scratch via spec-kitty init, never via fixture-copy of .kittify"scaffold project", "blank project"

Functional Requirements

IDRequirementStatus
FR-001A new test file tests/e2e/test_charter_epic_golden_path.py SHALL exist and SHALL be marked @pytest.mark.e2e and @pytest.mark.slow.Approved
FR-002The test SHALL drive the entire scenario through public CLI subprocess invocations, reusing the existing isolation helpers (tests/conftest.py::run_cli and tests/test_isolation_helpers.py::run_cli_subprocess).Approved
FR-003The test SHALL use a new fresh-project fixture (e.g. fresh_e2e_project) added to tests/e2e/conftest.py that initializes a temp project from scratch via spec-kitty init . --ai codex --non-interactive, rather than copying .kittify from the source checkout.Approved
FR-004The test SHALL execute, in order: init, charter interview --profile minimal --defaults --json, charter generate --from-interview --json, charter bundle validate --json, charter synthesize --dry-run --json, charter synthesize --json, charter status --json, charter lint --json.Approved
FR-005The test SHALL choose the smallest deterministic composed mission that exercises the public runtime / next composition path (preference order: software-devdocumentation → minimal custom mission) and SHALL document the chosen mission in the test file.Approved
FR-006The test SHALL issue exactly one composed action via spec-kitty next --agent test-agent --mission <handle> --json, and SHALL advance that action via spec-kitty next --agent test-agent --mission <handle> --result success --json.Approved
FR-007The test SHALL run spec-kitty retrospect summary --json against the temp project and SHALL parse the returned envelope.Approved
FR-008The test SHALL assert that every --json output produced by an expected-success command is parseable JSON and contains the documented public field names for that command.Approved
FR-009The test SHALL assert that .kittify/charter/charter.md exists in the temp project after the charter generate step.Approved
FR-010The test SHALL assert that charter bundle validate --json reports a success / compliance state.Approved
FR-011The test SHALL assert that charter synthesize --dry-run --json reports planned work and does NOT mutate .kittify/doctrine in the temp project.Approved
FR-012The test SHALL assert that charter synthesize --json writes .kittify/doctrine and provenance / manifest state in the temp project.Approved
FR-013The test SHALL assert that charter status --json reports a non-error state and that charter lint --json exits successfully or returns a documented warning-only status; warnings MUST NOT be silently downgraded.Approved
FR-014The test SHALL assert that the issued step from next --json exposes a non-null prompt-file path when the public JSON exposes a prompt-file field.Approved
FR-015The test SHALL assert that next --result success either advances exactly one action OR returns a documented structured block describing an intentionally missing guard artifact for the chosen mission.Approved
FR-016The test SHALL assert that .kittify/events/profile-invocations/*.jsonl in the temp project contains paired pre/post lifecycle records for the issued action, and that the recorded action name equals the actual mission step / action name (not a role-default verb such as analyze or audit, unless the step is literally audit).Approved
FR-017The test SHALL capture a baseline snapshot of git status --short in the source checkout before any temp-project work begins, and SHALL assert the post-test value matches the baseline.Approved
FR-018The test SHALL additionally assert that no new or modified files appear inside the source checkout under kitty-specs/, .kittify/, .worktrees/, docs/, or any profile-invocations/ path, even if .gitignore would mask them from git status.Approved
FR-019On any subprocess failure, the test SHALL include in the assertion message the command, cwd, return code, stdout, and stderr; and MAY render a compact (non-recursive-by-default) tree of the temp project.Approved
FR-020The fresh-project fixture SHALL clean up its temp directory after the test, regardless of pass or fail.Approved
FR-021If the public CLI surface differs from the recommended flow (different flag names, missing public alternative for a step, etc.), the test SHALL be adjusted to use the actual public surface, and the deviation SHALL be recorded in the PR description as a finding.Approved

Non-Functional Requirements

IDRequirementThresholdStatus
NFR-001The test SHALL complete on a current CI runner within a deterministic budget.≤ 180 seconds wall-clock per run, median across 5 consecutive runs.Approved
NFR-002The test SHALL be deterministic across runs on the same commit.0 spurious failures across 20 consecutive local runs on a clean source checkout.Approved
NFR-003The test code SHALL pass the project's standard quality gates.uv run ruff check tests/e2e/test_charter_epic_golden_path.py tests/e2e/conftest.py exits 0; uv run mypy --strict tests/e2e/test_charter_epic_golden_path.py exits 0.Approved
NFR-004The test SHALL be diagnosable on first failure without rerunning.A single failed run captures command, cwd, return code, stdout, and stderr for the failing step in the assertion message.Approved
NFR-005The test SHALL operate fully offline.Zero outbound network calls; SPEC_KITTY_ENABLE_SAAS_SYNC is not set; "Not authenticated, skipping sync" or equivalent benign output is acceptable.Approved
NFR-006The new test SHALL coexist with the existing E2E and integration regression slices.Running uv run pytest tests/e2e/ tests/next/ tests/integration/test_documentation_runtime_walk.py tests/integration/test_research_runtime_walk.py -q on the branch passes with no previously-green test newly failing.Approved

Constraints

IDConstraintStatus
C-001The test MUST NOT call any private helper, including but not limited to decide_next_via_runtime, _dispatch_via_composition, StepContractExecutor, run_terminus, and apply_proposals.Approved
C-002The test MUST NOT monkeypatch the dispatcher, step contract executor, DRG resolver, or frozen-template loader to make the flow pass.Approved
C-003The test MUST NOT modify the existing e2e_project fixture or the existing tests/e2e/test_cli_smoke.py legacy smoke behaviour.Approved
C-004The test MUST NOT use deprecated hidden CLI aliases when a public alternative exists.Approved
C-005The test MUST NOT depend on SaaS sync, hosted auth, or remote tracker connectivity.Approved
C-006All files written by the test MUST live under the temp project directory. Nothing may be written into the source checkout, including under ignored paths inside the source checkout.Approved
C-007The test file's pytest markers MUST place it in the existing e2e-cross-cutting CI lane, OR the PR MUST explicitly propose and justify a different gate.Approved
C-008The deliverables of this tranche are limited to (a) the new test file and (b) minimal additive helpers / fixtures in tests/e2e/conftest.py. No production behaviour changes, unless the E2E exposes a real blocking bug — in which case the bug is reported and tracked as a follow-up rather than fixed inside this tranche.Approved
C-009The test MUST NOT silently skip locally or in CI when it cannot establish a fresh project; if a public CLI command required by the flow cannot run from a fresh project, the test fails loudly and the failure is escalated as a product finding.Approved

Success Criteria

  • SC-001 A maintainer running the verification commands listed in start-here.md from a clean checkout of test/charter-e2e-827-tranche-1 sees the new golden-path test pass with exit code 0.
  • SC-002 After the test passes, git status --short in the source checkout shows zero new or modified entries.
  • SC-003 A reviewer can read the PR description and find: which public CLI commands the golden path executes, which Charter epic surfaces are covered by this tranche, which #827 items remain for follow-up, whether any product defects were discovered while writing the E2E, and the exact verification commands and results.
  • SC-004 The new test runs alongside the existing E2E suite without breaking any previously-green test in tests/e2e/, tests/next/, tests/integration/test_documentation_runtime_walk.py, or tests/integration/test_research_runtime_walk.py.
  • SC-005 A subsequent tranche of #827 can extend this spine (additional missions, dashboard coverage, plain-English scenarios, external canaries) without first having to refactor this tranche's test to use private helpers.

Key Entities

  • Temp project — a per-test directory outside the source checkout where the entire operator-path scenario plays out.
  • Fresh-project fixture — the new pytest fixture in tests/e2e/conftest.py that builds the temp project from spec-kitty init, never via fixture-copy of .kittify from the source checkout.
  • Source-pollution baseline — the git status --short snapshot captured in the source checkout before the test runs; the post-test snapshot must equal this baseline.
  • Lifecycle record file.kittify/events/profile-invocations/*.jsonl in the temp project; each issued action produces a paired pre/post entry whose action name matches the actual mission step.
  • Mission handle — the value passed to spec-kitty next --mission <handle>; may be mission_id (ULID), mid8, or mission_slug. Resolver disambiguates by mission_id; ambiguity is a structured error, not a silent fallback.

Assumptions

  • Default test agent identifier is test-agent, matching the flow shape recommended in start-here.md.
  • Default charter interview profile is minimal with --defaults, yielding a deterministic charter for any project that does not require human-supplied governance content.
  • The chosen composed mission can be advanced through at least one next issue + result cycle without requiring large generated content. If the preferred software-dev mission requires more artifact scaffolding than is reasonable here, the test falls back to documentation, then to a minimal custom mission, in that order — and the final choice is documented in the test.
  • The existing subprocess isolation pattern (run_cli / run_cli_subprocess) is sufficient for CLI invocations in this tranche; no new isolation helper is required.
  • The e2e-cross-cutting CI lane currently runs tests marked @pytest.mark.e2e + @pytest.mark.slow. If that mapping has changed, the PR notes will document the corrected gate (per C-007).
  • SPEC_KITTY_ENABLE_SAAS_SYNC is not set during this test; charter / next / retrospect paths are exercised in their local-only mode.

Out of Scope (this tranche)

  • External canaries living in the spec-kitty-end-to-end-testing repo.
  • Plain-English scenarios in spec-kitty-plain-english-tests.
  • Full CLI subprocess walks for every built-in mission type.
  • Browser- or dashboard-level coverage of Charter surfaces.
  • Any production behaviour rewrite, unless the E2E exposes a blocking bug whose minimum fix is also in scope; that case is called out explicitly in the PR.
  • Retrospective coverage beyond a single retrospect summary --json invocation.
  • Multi-mission or multi-action runtime walks.

Decisions

  • DM-01KQ807NKAS36HJPG6WBQN5C6G — Resolved. Use a new fresh_e2e_project fixture (option A). The existing e2e_project fixture, which copies .kittify from the source checkout, is not acceptable for this tranche. Reusing it would weaken the operator-path proof and preserve the exact class of hidden-coupling failures the tranche is meant to surface.

No open [NEEDS CLARIFICATION] markers.