CaaCS DELTA — Did v3.1.10→v3.2.0 Move the Forensic Needle? (researcher-robbie)

Author: Researcher Robbie (CaaCS quantitative data engine — DELTA / before-after lens). Branch: design/naming-identity-ssot-alignment @ 3.2.0 (read-only; no commit, no branch switch). Range: v3.1.10 (6975ee2, 2026-06-04) .. v3.2.0 (40e5209, 2026-06-16) — 2317 commits, 2160 non-merge. Companion to: the 3-POV squad (corroboration-{priti,alphonso,paula}-*.md) and the static CaaCS snapshot (naming-identity-ssot-strangler/caacs-*.md). Their lenses were static (the shape at HEAD); this note is the temporal SHIFT — what the forensic metrics looked like before the strangle vs after.

Directives & tactic applied (governance)

DIRECTIVE_003 — Decision Documentation Requirement. Every number below carries the exact git show / git log / radon cc -s -a / git grep command that produced it, reproducible on this branch. No verdict is asserted without a traceable before→after pair.
Tactic forensic-repository-audit (CaaCS, after Tornhill's Your Code as a Crime Scene). Steps applied: (1) exclusion scope, (2) churn hotspots, (6) bug-hotspot, (7) complexity overlay (radon cc, since cloc absent — Python-only scope makes wc -l SLOC acceptable), and the change-coupling (co-change) extension. Adapted to a DELTA frame: every metric is computed at both tags. Failure modes honoured: no rename-following in bulk recipes (flagged per file where it bites); complexity overlay is mandatory (raw churn does not measure hardness); the v3.2.0 release is squash/rebase-divergent from v3.1.10 (git merge-base --is-ancestor returns false) so the in-range churn is the rebased-replay history, not a clean linear ancestry — counts are directional and I note where the squash collapses signal.
Modes: investigation (mine the two tag boundaries) + synthesis (reconcile with the squad).
Avoidance boundary: I supply the empirical before/after dataset; I do not adjudicate code-design correctness (alphonso/randy) nor make the final release call.

Exclusion list (tactic step 1): lockfiles, __pycache__, generated agent dirs (.claude/ etc., naturally outside src/), JSONL/JSON mission state. Surface is hand-curated Python source.

The before/after metric table — authorities vs consumers

Complexity via radon cc -s -a on the blob at each tag (git show <tag>:<path> | radon); SLOC via wc -l; in-range churn via git log v3.1.10..v3.2.0 -- <path>. maxCC = worst single block (ruff/Sonar ceiling is 15). Δ columns: SLOC and maxCC change. Decimal points normalised from the locale comma.

Authority modules (should COOL if strangling works)

File	v3.1.10 SLOC / avgCC / maxCC	v3.2.0 SLOC / avgCC / maxCC	ΔSLOC	ΔmaxCC	in-range churn (commits, +/−)	Verdict
`lanes/branch_naming.py`	321 / 3.3 / 8	844 / 2.8 / 8	+523	0	7 · +541/−18	COOL (grew in size as the grammar centralised, but avgCC fell 3.3→2.8 and maxCC held at 8 — pure additive primitives, no hot block)
`mission_runtime/context.py` (net-new; predecessor `core/execution_context.py`)	287 / 5.0 / 17 (as exec_context)	274 / 1.5 / 3	−13	−14	2 · +290/−15 (new) ; predecessor 9 · +86/−368	COOL — strongest (the authority was re-extracted simpler: avgCC 5.0→1.5, maxCC 17→3; predecessor hollowed by −368 lines into a shim)
`mission_runtime/resolution.py` (net-new)	—	815 / 4.2 / 11	(born)	(born ≤15)	5 · +899/−83	BORN-COLD (net-new authority, maxCC 11 under the 15 ceiling at birth)
`missions/_read_path_resolver.py` (net-new)	—	424 / 3.2 / 8	(born)	(born)	8 · +499/−74	BORN-COLD (maxCC 8, avgCC 3.2)
`coordination/surface_resolver.py` (net-new)	—	568 / 3.2 / 14	(born)	(born)	8 · +634/−65	BORN-COLD (maxCC 14 — at the ceiling but under it)
`coordination/types.py` (net-new)	—	160 / 1.2 / 2	(born)	(born)	4 · +166/−5	BORN-COLD (value-object module, avgCC 1.2)
`ownership/validation.py`	261 / 5.6 / 12	384 / 4.8 / 12	+123	0	7 · +146/−23	COOL (avgCC 5.6→4.8, maxCC held at 12)
`status/emit.py`	479 / 5.5 / 23	852 / 7.4 / 40	+373	+17	18 · +560/−187	HEAT (the one authority that warmed: avgCC ↑, maxCC 23→40, busiest authority by churn)
`status/reducer.py`	213 / 5.0 / 14	345 / 7.2 / 24	+132	+10	6 · +160/−28	HEAT (mild) (maxCC 14→24 — crossed the ceiling)
`status/transitions.py`	358 / 5.3 / 14	131 / 3.4 / 6	−227	−8	4 · +115/−342	COOL — clean (the matrix was extracted out: −227 SLOC, maxCC 14→6)

Consumer hotspots (should HEAT if under-adopted)

File	v3.1.10 SLOC / avgCC / maxCC	v3.2.0 SLOC / avgCC / maxCC	ΔSLOC	ΔmaxCC	in-range churn (commits, +/−, bugfix)	Verdict
`cli/commands/implement.py`	678 / 8.4 / 34	1355 / 6.9 / 57	+677	+23	34 · +1013/−336 · 28 bugfix	HEAT (doubled in size, maxCC 34→57; avgCC fell only because volume diluted it)
`cli/commands/merge.py`	1218 / 11.6 / 60	3340 / 7.9 / 102	+2122	+42	52 · +2791/−668 · 43 bugfix	HEAT — severe (2.7× SLOC, maxCC 60→102)
`cli/commands/agent/tasks.py`	3041 / 15.9 / 118	4539 / 11.9 / 178	+1498	+60	53 · +2020/−522 · 47 bugfix	HEAT — severe (already a god-module; maxCC 118→178)
`cli/commands/agent/workflow.py`	1816 / 13.2 / 71	2731 / 8.5 / 84	+915	+13	54 · +1789/−852 · 46 bugfix	HEAT (+50% SLOC, maxCC 71→84)
`cli/commands/agent/mission.py`	2165 / 11.6 / 158	3939 / 10.5 / 220	+1774	+62	60 · +2608/−833 · 46 bugfix	HEAT — severe (1.8× SLOC, maxCC 158→220, highest churn of the whole surface)
`dashboard/scanner.py`	784 / 5.8 / 21	891 / 5.6 / 22	+107	+1	11 · +322/−130 · 10 bugfix	STABLE (the one consumer that did NOT heat materially — small, contained)

Reading the table (the load-bearing before/after):

Every extracted authority is COOL or BORN-COLD by max-block complexity — except the status/ write-side (emit/reducer HEAT). mission_runtime/context.py is the cleanest proof the extraction worked: the same authority went avgCC 5.0→1.5, maxCC 17→3 while the old home (core/execution_context.py) was hollowed by −368 lines into a re-export shim. status/transitions.py lost 227 SLOC as the matrix was pulled out (maxCC 14→6).
Every consumer god-module HEATED hard — 4 of 6 grew their worst block past it: merge 60→102, tasks 118→178, mission 158→220. They grew +677…+2122 SLOC each and carry 28–47 bugfix commits apiece in-range. The danger migrated out of the seam and into the un-consolidated callers — exactly the squad's "extract-then-under-adopt".
The two exceptions are diagnostic. status/emit HEATED as an authority (the write-side is the live coordination strangler the goals-doc defers to 3.3.x — so its warming is scheduled, not drift). dashboard/scanner.py stayed STABLE — the one consumer small enough to carry its inline mid8 without bloating.

Coupling-shift finding (Question 3)

Change-coupling = co-change pairs among the strangled surface, mined with git log --name-only --pretty=format:'@%H' + an AWK pair-counter. The two windows are volume-comparable (consumer-cluster commits: 166 before vs 159 in-range), so per-commit density is a fair before/after.

Window	Σ consumer-pair co-changes	consumer-cluster commits	coupling density (pairs/commit)
BEFORE (`..v3.1.10`)	213	166	1.28
IN-RANGE (`v3.1.10..v3.2.0`)	170	159	1.07

Per-commit consumer coupling dropped ≈16%. And the intensity profile flattened: the single dominant pre-range edge (tasks↔workflow = 40 co-changes, the old whack-a-mole spine) fell to 16 in-range, with no single pair exceeding 18. So the tight 2-file ping-pong loosened.

But the coupling did NOT dissolve — it broadened. In-range the top edges spread across the whole orchestrator quad fairly evenly (tasks↔merge 18, mission↔workflow 18, mission↔tasks 16, tasks↔workflow 16, tasks↔implement 15) and agent/mission.py entered the hot cluster (absent pre-range top-12, now in 3 of the top edges). Net: the coupling moved from one fragile edge to a diffuse 5-node clique. Authority↔consumer coupling is the encouraging signal — consumers now co-change with status/emit (8×), surface_resolver (6×), _read_path_resolver (6×): they are starting to move with the seams, which is what adoption looks like mid-flight.

Verdict: coupling-density DROPPED (good), but the absolute co-change among the un-consolidated orchestrators PERSISTS as a diffuse clique (the adoption gap).

The ratchet law, quantified (Question 4)

Occurrence counts across src/**/*.py via git grep -E <pattern> <tag>. This is the empirical test of the squad's law: "what is ratcheted shrinks; what is not, grows."

Idiom	Ratcheted?	v3.1.10	v3.2.0	Δ	Reading
`from specify_cli.status.<sub>` deep-imports (status boundary bypass)	YES (`test_status_module_boundary.py`, shrinking allowlist)	182	43	−139 (−76%)	RATCHETED → SHRANK HARD. The single clearest "needle moved" number in the dataset.
`status/transitions.py` matrix SLOC	YES (extracted behind facade)	358	131	−227 SLOC	RATCHETED → SHRANK.
`kitty/mission…` f-strings (worktree-name guard scope)	PARTIAL (`test_no_worktree_name_guess.py`)	42	63	+21	Mixed — the guarded subset shrank (squad: 7→5) but the broad token grew as the grammar spread.
`mission_id[:8]` (mid8 hand-roll)	NO literal-ban ratchet (only 2 incidental test refs; neither a ban)	4	26	+22 (+550%)	UN-RATCHETED → GREW. 7 of the 26 are inside `mission_runtime` (legit authority-internal); ~19 are consumer hand-rolls.
`[:8]` (any 8-slice)	NO	15	46	+31	UN-RATCHETED → GREW.
`parents[2]` (path re-derive)	NO production ban (38 test refs are fixture math, not a guard)	4	10	+6 (+150%)	UN-RATCHETED → GREW.

The counter-evidence that proves the extraction is real, not cosmetic — the SSOT vocabulary spread in lockstep with (and to the same magnitude as) the hand-roll it is meant to replace:

Canonical SSOT surface	v3.1.10	v3.2.0	Δ
`mid8` token (the SSOT grammar)	67	605	+538 (9×)
`resolve_mid8` (canonical mid8 helper)	0	26	born + adopted to parity with the 26 raw `mission_id[:8]`
`resolve_action_context` (context SSOT entrypoint)	3	27	+24 (9×)
`branch_naming` SSOT import (consumers routing to the seam)	14	64	+50 (4.6×)

Quantified ratchet law — CORROBORATED. The one class that got a shrinking-allowlist ratchet (status boundary imports) collapsed 182→43. Every class left un-ratcheted (mission_id[:8] +550%, parents[2] +150%, [:8] +207%) grew. Meanwhile the canonical replacements were built and adopted at scale (mid8 9×, resolve_mid8 0→26, resolve_action_context 9×, branch_naming import 4.6×) — so this is co-existence, not failure: the SSOT is in place and climbing; the old idiom persists only where no ratchet forces the cut. The law is not a metaphor here — it is the literal mechanism separating the shrinking surfaces from the growing ones.

Per-goal forensic verdict (Question 5)

Goal	Forensic SHIFT evidence	Verdict
G1 Doctrine→runtime depth	Out of direct scope for this surface set, but the substrate cooled: `mission_runtime/{context,resolution}` born-cold (maxCC 3/11) give the `next` loop a clean context authority to render doctrine against. No authority for G1's own seam heated.	SUPPORTED-for-direction (substrate cooled; not the focus of this delta)
G2 Core-domain strangler → SSOT	The headline corroboration. Authorities COOL/born-cold (context maxCC 17→3; transitions −227 SLOC; 4 net-new authorities born ≤ ceiling), consumers HEAT (merge maxCC 60→102, mission 158→220), and the canonical grammar spread 9× (`mid8` 67→605). The extract→route arc is visible in the metric shift, not just the ADRs.	SUPPORTED — strongest (the before/after shift directly shows extract-then-under-adopt)
G3 DevEx & enablers	The ratchet that exists (status boundary) moved the needle 76% (182→43) — empirical proof the ratchet enabler works. The gap is enabler coverage: no `[:8]`/`parents[2]` literal-ban, so those grew. G3's job for 3.2.x is to extend the ratchet wall to the un-guarded idioms.	SUPPORTED — with a named coverage gap

Biggest un-closed adoption gap

It is NOT status. The squad/alphonso flagged status' ~245 bypass imports as the worry; the delta shows status is the domain that visibly CLOSED — deep-imports 182 → 43 (−76%) under its module-boundary ratchet, and transitions.py shed 227 SLOC. Status is the success story of this range.

The biggest un-closed gap is IDENTITY / mid8 derivation. It is the only domain where the authority is fully built and named (resolve_mid8 0→26, mid8 grammar 67→605) yet the raw hand-roll grew in lockstep (mission_id[:8] 4→26, +550%) because it has no literal-ban ratchet (confirmed: only 2 incidental test references, neither a guard). The SSOT and its bypass now sit at numerical parity (26 vs 26) with nothing forcing the cut. That is the canonical "extract-then-under-adopt" frozen in the metrics — and the cheapest, highest-ROI 3.2.x action is a mission_id[:8] literal-ban ratchet pointed at resolve_mid8, which would flip identity from the worst gap to a closing one exactly as the status boundary ratchet did.

Overall forensic corroboration verdict

The v3.1.10→v3.2.0 strangling DID move the forensic needle, asymmetrically and exactly as a mid-flight strangler-fig should. Authorities cooled or were born cold; consumers heated; the one domain with a real ratchet (status) collapsed its bypass surface 76%; the domains without one (identity, path-derivation) grew their bypass even as the SSOT was built beside it. The shift is real and directional (the extraction worked), incomplete (adoption lags), and self-limiting (the ratchet mechanism is proven to close gaps where applied) — which is the empirical signature of "extract authority → route consumers, even if adoption lags," not drift.

Reproduction (every command, per DIRECTIVE_003)

RADON=/home/stijn/.pyenv/versions/3.13.12/bin/radon
# complexity+SLOC at a tag:
git show v3.1.10:src/specify_cli/cli/commands/merge.py | $RADON cc -s -a -   # avg + per-block CC
git show v3.1.10:src/specify_cli/cli/commands/merge.py | wc -l               # SLOC
# in-range churn (squash/rebase-divergent range — directional):
git log --oneline v3.1.10..v3.2.0 -- <path> | wc -l
git log --numstat --format='' v3.1.10..v3.2.0 -- <path>                       # +/- lines
# occurrence delta (ratchet law):
git grep -h -E 'mission_id\[:8\]' v3.1.10 -- 'src/**/*.py' | wc -l
git grep -h -E 'mission_id\[:8\]' v3.2.0  -- 'src/**/*.py' | wc -l
# status boundary bypass:
git grep -h -E 'from specify_cli\.status\.[a-z]' v3.1.10 -- 'src/**/*.py' | wc -l   # 182
git grep -h -E 'from specify_cli\.status\.[a-z]' v3.2.0  -- 'src/**/*.py' | wc -l   # 43
# change-coupling: git log --name-only --pretty=format:'@%H' <range> -- <paths> | awk <pair-counter>

Caveats (tactic failure-modes): (1) v3.2.0 is squash/rebase-divergent from v3.1.10 (git merge-base --is-ancestor v3.1.10 v3.2.0 = false), so in-range churn is the rebased-replay history — directional, and the squash collapses some per-file granularity. (2) bulk recipes do not follow renames; mission_runtime/context.py is tracked against its predecessor core/execution_context.py manually (the −368-line hollowing is the predecessor's in-range delta). (3) Conventional-Commits inflates raw bugfix %, so the absolute bugfix counts are reported, not densities. (4) radon mi floors at 0.0 for the largest modules; maxCC is the load-bearing complexity series here, not MI.