Spec Kitty

└─ kitty-specs
   └─ Refactor-Stable Gate Substrate

Mission Run:

📚 Docs ↗

Phase 0 Research — refactor-stable-gate-substrate-01KWK3FY

Consolidates the pre-spec census (debugger-debbie) and the post-spec squad evidence (reviewer-renata, paula-patterns), all command-backed on this branch, 2026-07-03. Entry-point facts re-verified at plan time.

D1 — Design-P over Design-S (the mission ADR)

the live scanner re-derives and checks membership.

token (('f','x = 1') vs live ('f','return primary_feature_dir_for_mission …')) → MATCH False. Seed re-derivation (Design-S) is NOT drift-immune with a fixed seed.

token → MATCH True. A content change to an allowlisted site is INVISIBLE under Design-S.

the membership scan); a content change breaks the match (staleness guard fires).

(_ALLOWED_SITES_FILES frozen composites + test_pinned_composites_still_live staleness guard + drift theater already shipping). Design-S = _RAW_JOIN_SITES (stays as-is in its own family; not propagated).

NFR-001); hand-typed tokens (REJECTED — _RAW_JOIN_SITES' own rule: tool-derived only; hand-typing invites typos and drift).

Decision: frozen tool-derived (file, qualname, token) comparands (Design-P);
Evidence (renata's drift probe, real _ratchet_keys.composite_key):
Scenario A — blank line above the site, fixed seed: allow-key derives the WRONG
Scenario B — in-place token change, same seed line: both sides re-derive the NEW
Design-P satisfies both halves: drift changes nothing (frozen token still found by
In-tree precedents: Design-P = test_no_worktree_name_guess.py
Alternatives considered: Design-S everywhere (REJECTED — empirically fails

D2 — Gate anatomy (verified at plan time)

loader :217, derive_live_key :270, scanners :455/:597; entry points check_canonicalizer_gate :462 and check_coord_authority_gate :604; int-line test constructors ~:645/:714/:1034/:1069 (~6) — ALL convert in the same WP (mypy strict).

survives as a non-authoritative locator so the jump-to ergonomics keep working.

and implement.py entries; review twice within workflow.py — today disambiguated ONLY by line. Post-conversion: by (file, token). Current tokens verified distinct (different LHS names), but the design documents the within-function collision rule: if two allowlisted sites in one function carry identical tokens, the entry covers N occurrences (count-qualified) — mirroring the reference implementation's handling.

the conversion is contained in one file pair.

GateAllowlistKey(enclosing_qualname: str, token_line: int) — construction sites:
Violation messages print {rel_path}:{qualname}:{token_line} (:475/:618) — the line
Key collision facts (paula): qualname implement appears in BOTH workflow.py
Consumers: zero external consumers of the YAML or the key type (grep-verified) —

D3 — Audit anatomy (paula, verified)

row dataclass + key() = f"{rel_path}:{line}" + main() checks:

sink_op); Check-2 (undercount, raw line) :379-398; Check-3 (KNOWN_CANDIDATE_FILES`, path-level — already drift-immune, untouched).

handle_source) + SelectionRow(rel_path, line, name, in_seam)` — TWO checks (:529, :549).

sink_op)` identity collides 7/30 (untrusted) and 6/27 (surface) — the undercount tripwire would be defeated by construction. STRUCK.

composite_key_from_file(path, line) — derivable today from the existing row data (line as the one-time locator).

composite seed by test_single_mission_surface_resolver.py:464,498 while the same inventory's own audit compares raw lines — IC-03 reconciles (one identity model per inventory) without touching the resolver test.

deleted sinks leave silent ghost rows. The NEW overcount guard closes this (modulo explicitly-tagged rows, e.g. rows kept for documentation of intentionally-removed sinks — the tag syntax is a data-model item).

The audits are copy-paste twins — no shared import, each defines its own frozen
untrusted_path_audit/audit.py: `SinkRow(rel_path, line, untrusted_source,
surface_resolution_audit/audit.py: `ResolutionRow(rel_path, line, call_name,
A THIRD raw-line compare duplicated in test_untrusted_path_containment.py:328.
Line-drop non-viability (paula's collision test): coarse `(path, source,
Chosen identity: (path, enclosing_qualname, token) via
Split-brain: the surface inventory's line column is ALREADY consumed as a
Overcount hole: neither audit checks that inventory rows map to live sinks —

D4 — Freeze tooling (FR-003 design)

token: (+ keep line: as locator). Fail-closed: seed resolving to <module> scope, empty tokens, or a file that doesn't parse aborts with the offending entry named. The converter is a throwaway script recorded in the mission dir (not shipped tooling); the runtime staleness guard is the standing protection.

guidance (content changed or site removed — both require a human decision, exactly the review moment the gate exists to force).

One-time converter: read YAML seeds → composite_key_from_file → write file: +
Runtime semantics: frozen key not found live → gate FAILS with "evict or re-approve"

D5 — CT9 mechanics (census + renata's bar)

tests/_support/quarantine.py).

1 retrospect help, 11 upgrade-ux, 1 decision-widen help, 1 decision-shape, 1 doctor-ops usage.

failures (calls[0][1] is None — env dict no longer passed; missing --python 3.13 argv suffix) — NOT "env-dependent". Upstream issue to file for the upgrade domain. A16 perf case (0.511s vs 0.5 budget) keeps its CI-timing rationale.

(-n auto --dist loadfile, marker-scoped; serial -n0 for real-port/daemon classes), twice, plus a green CI run pre-merge.

Bypass for verification: SPEC_KITTY_RUN_QUARANTINE=1 (conftest skip gate,
15 verified-passing node ids (census 2026-07-03; re-verify on implement day):
Stay-behind reasons to REWRITE: the 2 uv-tool cases are hermetic behavioral-drift
Determinism evidence: the real CI shard invocation form per the parallel-run rules

D6 — Doctrine mechanics (census, verified at plan time)

(schema: principles list; patterns/anti_patterns with name/description/ good_example/bad_example). Activated in .kittify/config.yaml; DRG edges exist.

src/doctrine/graph.yaml; two byte-for-byte freshness gates (tests/doctrine/drg/migration/test_extractor.py + test_path_ref_resolver.py) enforce same-change regeneration.

US3 enumeration, non-empty good/bad examples, ≥1 citing PR #2308) — recorded in the acceptance matrix, NOT a standing suite test.

File: src/doctrine/styleguides/built-in/testing-principles.styleguide.yaml
Regeneration: generate_graph (src/doctrine/drg/migration/extractor.py:767) →
Completion evidence: acceptance-time parse script (≥6 named principles matching the

D7 — Doctrine content outline (what the styleguide gains)

Principles (draft names; final wording at implement): 1. invariants-over-shape (arch/acceptance tests pin invariants; a test that reds on a clean refactor measured the wrong thing) 2. negative-and-behavioral-forms-first (forbidden-pattern-absent AST scans; behavioral pins through public surfaces — over positive literal-presence scans) 3. size-metrics-belong-to-sonar (no LOC/complexity ceilings in pytest) 4. convert-or-delete-never-re-pin (with surviving-coverage proof) 5. shrink-only-count-ratchets-are-sanctioned (exception-set bounds change only when cleaning) 6. transitional-shape-guards-need-a-retirement-path (e.g. seam is-identity batteries → named migration of patch targets) Patterns/anti-patterns each carry a good/bad example citing the PR #2308 precedents (LOC-gate retirement; literal-scan deletion with surviving coverage; the quarantine graveyard as the cost of shape-coupling).

D8 — Blockers/couplings

time); C-002 covers further drift.

deliberately NOT attempted here — named in the FR-009 #2072 comment as follow-up material (avoids scope growth; the identity unification is the enabling step).

PR #2308 coupling resolved by branching on its tip (rebased to 84b2eeed0 at plan
The audit-twin CONSOLIDATION (a shared audit module) is a real improvement

D9 — Post-tasks squad errata (2026-07-03; line refs re-verified on b824111e7)

Debbie's code-truth pass after the #2308/#2312 rebases — corrections to D2/D3/D5:

derive_live_key, :433/:455 + :576/:597 scanners, :462/:604 entry points, :475/:618 messages). Int-line TEST constructors = 10, not ~6 (regions :645-647, :714-722, :1034, :1069). YAML: 3+7 entries confirmed; 5 spot-check re-derivations all PASS.

f"{rel_path}:{line}" at :383; SinkRow.key() is 3-part rel:line:sink_op); Check-3 starts :391 (untouched). Dup compare confirmed at containment test :328.

is "Check 4" :549-561; SelectionRow fields are (rel_path, line, call_name, in_seam_file) (:353-356).

it imports discover_rows (:164) and derives keys from the LIVE scan (:457/:464/:498); its _RAW_JOIN_SITES/_ALLOWLISTED_RAW_JOINS seeds live in the test file. The real coupling = the audit module's public shape. Split-brain narrative struck (spec rev 3).

(:420, stale_exemptions :465); drift theater test_composite_key_survives_line_drift (:936) + :991 + :1058. test_pinned_composites_still_live does not exist — citation fixed.

(run 28643092421). Marker distribution: retrospect 1, upgrade_ux 13, decision_widen 1, decision_shape 1, doctor_ops_cli 1, doctor_ops 1, mid8 2, no_checklist 1, daemon_reaper 10.

src/specify_cli/decisions/emit.py + src/specify_cli/widen/state.py (qualname-keyed, harmless).

Gate file: constructor sites all EXACT (:164/:174 def+field, :217 loader, :262/:270
Untrusted audit: Check-2 = :379-389 (the convert-target is the LOCATOR compare
Surface audit: ResolutionRow Check-2 :526-534 (raw compare :528); SelectionRow check
WP03 premise correction (HIGH): the resolver test does NOT read inventory.md —
Reference file: the staleness guard is test_name_compose_offenders_match_pinned_baseline
Quarantine truth table: 31 marked total; LOCAL 16 pass / 15 fail; CI 0 pass / 31 fail
Coord fixtures path note: the two by-design YAML entries resolve at

D10 — FR-010 CI-green fold mechanics

nodes with the bypass on; green = every node passes or skips on CI.

the local result, the quarantine reason, git history. Judge-the-test: stale→re-pin for CI reality; env-fragile assertion (Rich rendering, fresh-venv version skew)→ remediate robustly; valueless→delete; real-product-signal out of mission domain→ skip with issue ref (daemon-reaper #2309; uv-tool upstream issue).

authoritative gate is the lane run on the mission PR.

The lane: quarantine visibility (non-blocking) in ci-quality.yml runs the 31 marked
Adjudication inputs per test: the CI failure signature (scratchpad qlane.log capture),
Verification loop: local shard-form + differential skip evidence first; the