Phase 0 Research: Shared Package Boundary Cutover

Mission: shared-package-boundary-cutover-01KQ22DS Date: 2026-04-25 Branch contract: planning/base main → merge target main

This research log resolves the three open architectural questions identified in plan.md before implementation begins.


R0-1: Where exactly should the internalized runtime live, and what is its import surface?

Decision

The internalized runtime lives at src/specify_cli/next/_internal_runtime/, colocated with the existing runtime_bridge.py, decision.py, and prompt_builder.py modules.

The public surface (what runtime_bridge.py and next_cmd.py import) mirrors exactly what they import from spec_kitty_runtime today:

SymbolFrom spec_kitty_runtime (pre-cutover)New location (post-cutover)
DiscoveryContextspec_kitty_runtimespecify_cli.next._internal_runtime
MissionPolicySnapshotspec_kitty_runtimespecify_cli.next._internal_runtime
MissionRunRefspec_kitty_runtimespecify_cli.next._internal_runtime
NextDecisionspec_kitty_runtimespecify_cli.next._internal_runtime
NullEmitterspec_kitty_runtimespecify_cli.next._internal_runtime
next_step (aliased runtime_next_step)spec_kitty_runtimespecify_cli.next._internal_runtime
provide_decision_answerspec_kitty_runtimespecify_cli.next._internal_runtime
start_mission_runspec_kitty_runtimespecify_cli.next._internal_runtime
ActorIdentityspec_kitty_runtime.schemaspecify_cli.next._internal_runtime.schema
load_mission_template_filespec_kitty_runtime.schemaspecify_cli.next._internal_runtime.schema
MissionRuntimeErrorspec_kitty_runtime.schemaspecify_cli.next._internal_runtime.schema
_read_snapshotspec_kitty_runtime.enginespecify_cli.next._internal_runtime.engine
engine (module reference)spec_kitty_runtime.enginespecify_cli.next._internal_runtime.engine
plan_nextspec_kitty_runtime.plannerspecify_cli.next._internal_runtime.planner

Internal sub-module layout:

specify_cli/next/_internal_runtime/
├── __init__.py        # re-exports the public surface above
├── models.py          # DiscoveryContext, MissionPolicySnapshot, MissionRunRef, NextDecision
├── emitter.py         # NullEmitter + emitter Protocol
├── lifecycle.py       # next_step, provide_decision_answer, start_mission_run
├── engine.py          # _read_snapshot, snapshot persistence
├── planner.py         # plan_next, DAG planner
└── schema.py          # ActorIdentity, load_mission_template_file, MissionRuntimeError

Rationale

diff surface auditable. runtime_bridge.py is the only module that consumes the full surface; its re-import lines change from from spec_kitty_runtime import (...) to from specify_cli.next._internal_runtime import (...) — every other file in the call graph imports from runtime_bridge (which already abstracts the runtime), not from spec_kitty_runtime directly.

Python import surface, even though it is shipped in the wheel. External Python importers must continue to use specify_cli.next.* public symbols, not reach into _internal_runtime.

scope specify_cli.* as the top layer; placing the new module under specify_cli leaves those rules undisturbed and adds only one new package-boundary rule (the C-001 enforcement) on top.

top-level package. The repo already has 4 (kernel, doctrine, charter, specify_cli); a fifth top-level for runtime would explode review surface and conflict with the in-flight runtime-mission-execution-extraction-01KPDYGW mission's own canonical-runtime location decision.

  • Colocation with runtime_bridge.py keeps the import paths short and the
  • Underscore prefix marks _internal_runtime as not part of the public CLI
  • Existing layer rules in tests/architectural/test_layer_rules.py already
  • No top-level package: spec C-009 explicitly forbids introducing a new

Alternatives considered

runtime-mission-execution-extraction-01KPDYGW mission already proposes src/runtime/ for a different extraction (the CLI's internal "next-step decisioning + agent dispatch" layer, not the standalone PyPI runtime package). Stomping on that namespace creates a merge conflict and confuses reviewers about which runtime is which.

owns CLI asset / home-directory bootstrap (different concern). Reusing the name would collide.

pre-cutover code is ~3 sub-modules (engine, planner, schema) plus public models. Inlining ~3kLoC into a single file destroys reviewability and breaks the existing function-decomposition style.

  • src/runtime/ as a new top-level package: rejected. The
  • src/specify_cli/runtime/: rejected. That directory already exists and
  • Inlining the runtime code into runtime_bridge.py: rejected. The

Behavior-equivalence strategy

WP01 captures behavior parity by: 1. Running spec-kitty next against a checked-in fixture mission with the currently-installed spec-kitty-runtime 0.4.3 and recording golden JSON snapshots in tests/fixtures/runtime_parity/. 2. Running the same fixture against the new internalized runtime and asserting byte-equal snapshots (modulo timestamp / path normalization, identical to how mission runtime-mission-execution-extraction-01KPDYGW does it). 3. Any delta forces an iteration on _internal_runtime before WP02 begins.


R0-2: What is the compatibility range strategy for events and tracker, and what

happens to [tool.uv.sources] and constraints.txt?

Decision

ItemPre-cutover statePost-cutover state
pyproject.toml events depspec-kitty-events==4.0.0 (exact pin)spec-kitty-events>=4.0.0,<5.0.0
pyproject.toml tracker depspec-kitty-tracker==0.4.2 (exact pin)spec-kitty-tracker>=0.4,<0.5
pyproject.toml runtime depAbsent (intentionally)Absent (locked in by FR-006 + arch test)
[tool.uv.sources] for eventsspec-kitty-events = { path = "../spec-kitty-events", editable = true }Empty / absent
[tool.uv.sources] for trackerAbsentAbsent
constraints.txtPin spec-kitty-events==4.0.0 to paper over runtime's transitive <4.0 pinRemoved (no longer needed)
uv.lockPinned to dev pathRegenerated; pins to PyPI versions

Rationale

documents a SemVer policy: minor / patch within 4.x are non-breaking; major bumps (5.x) are breaking. The compatibility range >=4.0.0,<5.0.0 exactly matches that policy.

in implement-review at the time of this plan. The conservative range >=0.4,<0.5 matches the currently-published 0.4.2 line and reserves a tightening pass for when the upstream mission lands. The consumer test contract (FR-009) is the real safety belt; a too-loose range can only break CI explicitly via that contract, never silently in production.

PR #779's CI failure: they masked the missing spec-kitty-runtime dependency during local dev because the editable events install pulled the runtime transitively, while CI installed from PyPI and exploded. Removing them is non-negotiable.

separately in docs/development/local-overrides.md (created by WP10), using uv workspace patterns or explicit --with-editable flags that do not touch the committed pyproject.toml.

spec-kitty-runtime-transitive spec-kitty-events<4.0 pin conflict (see DRIFT-1 in mission review 01KPWT8P). With spec-kitty-runtime no longer a dependency, the constraint has no purpose and is removed in WP08.

  • The events public-surface contract from events-pypi-contract-hardening-01KQ1ZK7
  • The tracker mission tracker-pypi-sdk-independence-hardening-01KQ1ZKK is still
  • Editable / path overrides in [tool.uv.sources] were the direct cause of
  • Developer overrides for cross-package work (events / tracker) are documented
  • constraints.txt exists today exclusively to paper over the

Alternatives considered

cross-package release lockstep the cutover exists to dissolve. Spec acceptance criterion A7 forbids this.

contract; a 5.x major bump is breaking and must require an explicit CLI version bump.

is fully servable by a separate developer-only override file. Committing editable overrides re-introduces the failure mode that rejected PR #779.

It would become a magnet for similar paper-over hacks. Better to delete it cleanly and document developer override patterns in docs/development/local-overrides.md.

  • Keep exact pins (==4.0.0, ==0.4.2): rejected. Exact pins recreate the
  • Open-ended ranges (>=4): rejected. SemVer minor windows are the right
  • Keep [tool.uv.sources] for dev convenience: rejected. The dev workflow
  • Convert constraints.txt into a generic dev constraint file: rejected.

Cross-repo handshake

81d5ccd4. The compatibility range chosen here is committed against that sha.

implement-review. WP07 (consumer-test contract) and WP08 (pyproject.toml range) explicitly note that the tracker range may need to be tightened on rebase if the upstream mission lands a contract change before this mission's closing PR is merged. The orchestrator picks up that delta automatically when the WP07 work resumes.

  • Events mission events-pypi-contract-hardening-01KQ1ZK7 is merged at sha
  • Tracker mission tracker-pypi-sdk-independence-hardening-01KQ1ZKK is in

R0-3: How does the clean-install CI job structurally guarantee absence of

spec-kitty-runtime?

Decision

WP09 adds a new job named clean-install-verification to .github/workflows/ci-quality.yml. The job:

1. Checks out the repo at the PR head. 2. Builds the CLI wheel with python -m build --wheel. 3. Spins up a fresh container (e.g. python:3.12-slim) with no pre-installed packages. 4. Inside the container, runs: ``bash pip install dist/spec_kitty_cli-*.whl pip list | grep -i spec-kitty-runtime && exit 1 || true # must NOT be installed spec-kitty --version ` 5. Checks out a known fixture mission committed at tests/fixtures/clean_install_fixture_mission/ and runs: `bash spec-kitty agent mission setup-plan --mission <fixture-handle> --json spec-kitty next --agent claude --mission <fixture-handle> --json ` 6. Asserts the JSON output contains "result": "success" and that the loop advanced at least one step (event log gained at least one StatusEvent` row). 7. Asserts wall-clock runtime ≤ 5 minutes (NFR-004).

The local-runnable counterpart is tests/integration/test_clean_install_next.py, which uses subprocess and a tmp_path venv to perform the same flow on a dev machine. It is gated behind a marker (@pytest.mark.distribution) so the main test suite stays fast.

Rationale

python:3.12-slim image has no leftover state from the workspace's editable installs, no cached spec-kitty-runtime, and no [tool.uv.sources] precedence. If the CLI works there, it works for end users.

dep" into a CI-enforced observation.

tests/next/test_next_command_integration.py uses; reusing the assertion pattern keeps reviewer cognitive load low.

  • Container isolation is the strongest available guarantee: a fresh
  • Wheel install (not pip install -e .) mirrors the user experience.
  • Explicit pip list | grep assertion turns "spec-kitty-runtime is not a
  • JSON-asserted advancement is the same contract the existing

Alternatives considered

suite has hundreds of unit tests that don't need an isolated environment; running all of them in a clean venv would blow the 5-minute budget.

prove what users experience.

tox for one job is an unnecessary new tool.

  • Run the existing test suite in a clean venv: rejected. The existing
  • Mock the absence of spec-kitty-runtime: rejected. Mocking does not
  • Use tox: rejected. The repo standardized on uv/pytest; introducing

CI surface details

required to merge.

it is the smallest possible mission scaffold that exercises one spec-kitty next step (a single planned WP, no dependencies).

  • The job key in ci-quality.yml is clean-install-verification.
  • It is added to the protect-main.yml required-check set so a green run is
  • Fixture mission lives under tests/fixtures/clean_install_fixture_mission/;

Documentation handoff

CHANGELOG.md to point at this job for operators who want to verify the cutover locally.

  • WP10 updates docs/development/local-overrides.md (new) and

Resolved clarifications

No [NEEDS CLARIFICATION] markers existed in spec.md. None added by this research log.

Inputs for Phase 1

The following Phase 1 artifacts use this research as ground truth:

before / after.

in spec-kitty-events at sha 81d5ccd4, scoped to the subset CLI uses.

spec-kitty-tracker 0.4.2 SDK, scoped to the subset CLI uses.

  • data-model.md — uses R0-1's import-surface table to model the structural
  • contracts/internal_runtime_surface.md — copies the symbol table from R0-1.
  • contracts/events_consumer_surface.md — derives from the public-surface doc
  • contracts/tracker_consumer_surface.md — derives from the published
  • quickstart.md — uses R0-3's clean-install recipe.