tests/contract/test_cross_repo_consumers.py

Context

After the shared-package-boundary cutover (shared-package-boundary-cutover-01KQ22DS, 2026-04-25), spec-kitty-events and spec-kitty-tracker are external PyPI dependencies consumed via their public surfaces. Compatibility ranges live in pyproject.toml; exact pins live in uv.lock. Contract tests under tests/contract/ are the load-bearing assertions that protect the CLI from upstream-shape regressions.

Prior to this ADR, a key contract test (tests/contract/test_cross_repo_consumers.py) hard-coded the expected spec-kitty-events version as "3.2.0". When the cutover bumped the package to 4.0.0, that test went red and stayed red — visible to the WP04 reviewer of mission stability-and-hygiene-hardening-2026-04-01KQ4ARB as the canonical example of contract-test drift behind real package state. A hard-coded version string in a contract test is structurally guaranteed to drift the moment uv sync resolves a different version. Since the goal of tests/contract/ (per FR-023) is to be a hard mission-review gate, that pattern silently inverts: instead of catching drift it produces drift.

Decision

Every contract test that depends on a specific external-package version MUST resolve that version dynamically, rather than embedding it as a literal. The canonical resolution path is:

  1. Read uv.lock via tomllib and look up the package in the [[package]] array.
  2. If uv.lock is unavailable (e.g. clean-install CI run), fall back to importlib.metadata.version("<package-name>") and emit a RuntimeWarning.

For envelope-shape pinning we add a snapshot file at tests/contract/snapshots/spec-kitty-events-<resolved-version>.json generated by scripts/snapshot_events_envelope.py. The contract test loads the snapshot keyed by the resolved version. Bumping the package without regenerating the snapshot is, by design, a hard contract failure with a structured diagnostic that points at the snapshot script.

This pattern is owned by WP05 of mission stability-and-hygiene-hardening-2026-04-01KQ4ARB (see kitty-specs/stability-and-hygiene-hardening-2026-04-01KQ4ARB/contracts/events-envelope.md and the WP05 task file).

Consequences

Positive

  • Drift becomes loud, not silent. A bump to spec-kitty-events without a paired snapshot regeneration fails the contract gate (FR-022 + FR-023) immediately, instead of leaving the test pinned to a stale version forever.
  • Two-step bump workflow is explicit. Operators run uv sync (or edit the version range), then python scripts/snapshot_events_envelope.py --force, then pytest tests/contract/. This is documented in docs/guides/contract-pinning.md.
  • The mission-review gate has teeth. FR-023 declares pytest tests/contract/ a hard blocker; by removing hard-coded versions we eliminate the failure mode where the gate is "always red" and therefore ignored.

Negative / costs

  • Contributors must run the snapshot script after a package bump; forgetting to do so produces an obvious red test, but adds a friction step.
  • Snapshot files accumulate one-per-version. They are small JSON files under tests/contract/snapshots/, intentionally version-controlled to anchor regressions historically.

Operationally

  • Bumping spec-kitty-events is a 2-step PR:
    1. Edit pyproject.toml range and run uv lock.
    2. Run python scripts/snapshot_events_envelope.py --force and commit the new snapshot.
  • The dev workflow is documented in docs/guides/contract-pinning.md.

Alternatives considered

Alternative A — Hard-coded version literals in contract tests

This is the pre-2026-04-26 status quo. Rejected because:

  • It guarantees drift the moment uv.lock changes, with no signal at the bump site.
  • It conflicts directly with FR-023 (hard mission-review gate): a permanently red contract test makes the gate informational, not blocking.
  • The WP04 reviewer of this mission flagged it as the load-bearing example of why the contract surface needs resolved-version pinning.

Alternative B — Pin contract tests to pyproject.toml ranges (e.g. ">=4.0,<5")

Rejected because compatibility ranges describe what the CLI is allowed to install, not what it did install. A range-only contract test cannot distinguish 4.0.0 from 4.7.3 envelopes; the snapshot would have to be a union of all possible shapes within the range, which is the opposite of what a contract test should pin.

Alternative C — Generate the snapshot at test time

Rejected because the snapshot would then trivially match itself; there would be no shipped artifact to compare against. The whole point of the snapshot is that it is a reviewed, version-controlled representation of the upstream contract at a known point in time.

References

  • Mission: kitty-specs/stability-and-hygiene-hardening-2026-04-01KQ4ARB/spec.md (FR-022, FR-023, FR-024, FR-025, FR-026)
  • Research: kitty-specs/stability-and-hygiene-hardening-2026-04-01KQ4ARB/research.md D8
  • Contract: kitty-specs/stability-and-hygiene-hardening-2026-04-01KQ4ARB/contracts/events-envelope.md
  • Companion ADR: architecture/2.x/adr/2026-04-25-1-shared-package-boundary.md
  • Dev workflow: docs/guides/contract-pinning.md
  • Implementation:
    • scripts/snapshot_events_envelope.py
    • tests/contract/test_events_envelope_matches_resolved_version.py
    • tests/contract/test_cross_repo_consumers.py (rewritten as part of WP05)
    • tests/contract/snapshots/spec-kitty-events-<version>.json