Running the test suite in parallel
The Spec Kitty test suite runs safely in parallel locally and in CI, typically at least 2× faster on a machine with four or more cores. This page explains the one correct local command, why it is shaped the way it is, and how to reproduce the coverage-neutrality gates CI uses.
For what to do when a test goes red on CI unrelated to your diff — budget gates vs. correctness flakes vs. environmental flakes, and why we never retry-to-green — see the test-flakiness handling policy.
The local command
PWHEADLESS=1 pytest tests/ -n auto --dist loadfile -p no:cacheprovider
# daemon/real-port tests run serially:
PWHEADLESS=1 pytest tests/sync/test_orphan_sweep.py -n0 -q
The first command runs the bulk of the suite across worker processes. The second command runs the daemon/real-port tests serially. Run both; the parallel command deliberately leaves the serial-only tests for the second pass.
Why --dist loadfile (never bare --dist load)
pytest-xdist supports several distribution modes. We always use loadfile:
loadfilekeeps every test that lives in the same file on a single worker. File-scoped fixtures (scope="module", file-level collection ordering, shared module state) keep working exactly as they do serially.load(the bare default) scatters a single file's tests across multiple workers. That breaks file-scoped fixtures and any test that relies on collection order within a file.
For that reason: always pass --dist loadfile; never use bare --dist load. CI uses loadfile for the same reason.
-p no:cacheprovider disables pytest's cache plugin so a parallel run never
races on the shared .pytest_cache directory.
Per-worker HOME isolation (the master enabler)
A parallel run never touches the real ~/.spec-kitty. Each pytest-xdist
worker — and the serial "master" run when you omit -n auto — gets its own
isolated home directory. The isolation is set up in tests/conftest.py:
pytest_configurepointsHOME/USERPROFILEand the XDG dirs (XDG_CONFIG_HOME,XDG_DATA_HOME,XDG_STATE_HOME) at a per-worker base before collection, so modules that bind a home-derived path at import time (for examplespecify_cli.sync.daemon.SPEC_KITTY_DIR) resolve into the isolated home.- An autouse, function-scoped fixture re-asserts the
HOME/USERPROFILE/ XDG env vars for every test, keyed by worker id, so call-timePath.home()reads are isolated too. It does not monkeypatchPath.home(that approach was the cycle-1 regression that broke ~16tests/synccases — the fixture relies onPath.home()natively resolvingHOMEviaexpanduser), so a test that sets up its own tmp home viasetenv('HOME', ...)cleanly overrides the per-worker baseline.
The per-worker base is keyed by the xdist test-run UID and the worker id, so two
workers in the same run get distinct homes (no collision) and successive runs do
not reuse stale state. The regression guard
tests/architectural/test_real_home_isolation_guard.py (SC-006) and
tests/test_worker_home_isolation.py prove this invariant.
Because the real ~/.spec-kitty is never bound, you do not need to back it up or
worry about a parallel run truncating your real queue.db.
The serial daemon pass
Per-worker HOME isolation protects per-user state, but it does not protect
OS-global resources such as real TCP ports or singleton daemons. Tests that bind
the reserved daemon port range (9400–9449) — tests/sync/test_orphan_sweep.py —
must run in their own serial pass:
PWHEADLESS=1 pytest tests/sync/test_orphan_sweep.py -n0 -q
-n0 forces serial execution even when xdist is installed. These tests are
excluded from the parallel pool so two workers never contend for the same port.
Volume env gates (SPEC_KITTY_ULID_VOLUME_FULL)
Some tests exercise large-volume ULID generation. By default they run at a reduced scale so the local default stays fast; the full scale is reachable via an env gate (and is exercised on the nightly/full path). The assertion logic is identical across scales — only the volume changes.
pytest <ulid_test> -q # reduced (fast, default)
SPEC_KITTY_ULID_VOLUME_FULL=1 pytest <ulid_test> -q # full (nightly parity)
Running the stability ratchet locally
Before any shard is flipped to parallel, it must pass the stability ratchet (C-RATCHET): N consecutive green parallel runs with no new flakes. The same entrypoint CI uses is available locally (the WP02 coverage-safety harness):
python -m tests._support.coverage_safety.ratchet -n 3 -- tests/agent -m "not slow"
Exit code 0 means all N runs were green and the flip is accepted; 1 means it
was rejected and the summary names any new or flaky failures. The Python API is
run_ratchet(...) from tests._support.coverage_safety. See
tests/_support/coverage_safety/README.md for the full harness (collection
equivalence and anti-vacuity mutation checks).
Validate the acceleration (copy-pasteable)
These are the mission's reproducible validation steps. Run them from the repo
root to confirm the parallel run is coverage-neutral and at least 2× faster than
serial. (.venv/bin/pytest is the synced project interpreter; substitute
pytest if you run it directly.)
# 1. Serial baseline (whole-suite wall clock) and a per-shard nodeid reference.
time .venv/bin/pytest tests/ -q -p no:cacheprovider # serial baseline
.venv/bin/pytest tests/charter --collect-only -q | sort > /tmp/charter-serial.nodeids
# 2. Collection equivalence: serial vs parallel must collect identical nodeids.
.venv/bin/pytest tests/charter -n auto --dist loadfile --collect-only -q \
| sort > /tmp/charter-par.nodeids
diff /tmp/charter-serial.nodeids /tmp/charter-par.nodeids # must be empty
# 3. Stability ratchet: 3 consecutive green parallel runs (the same gate CI uses).
python -m tests._support.coverage_safety.ratchet -n 3 -- \
tests/charter -m "fast and not windows_ci"
# 4. Parallel-vs-serial timing: target ≥2× faster on a ≥4-core machine.
time PWHEADLESS=1 .venv/bin/pytest tests/ -n auto --dist loadfile -p no:cacheprovider \
--deselect tests/sync/test_orphan_sweep.py
time PWHEADLESS=1 .venv/bin/pytest tests/sync/test_orphan_sweep.py -n0 -q # serial pass
# 5. Real home untouched: mtime/inode unchanged (or path still absent) after the run.
ls -la ~/.spec-kitty 2>/dev/null
Status: local default vs CI shard flips
- Local parallel command — validated and safe now. Per-worker HOME isolation
(WP04) is in place, so the local command above runs without touching your real
~/.spec-kitty. Use it today. - CI shard flips — pending 3×-green confirmation. The mission's CI shard
flips (WP05, in
.github/workflows/ci-quality.yml) are gated on three consecutive green CI runs before they are considered locked in. The full "≥2× faster" claim is host-dependent and is confirmed against CI; locally you should still see a clear speedup on a multi-core machine.