CaaCS DELTA — Did v3.1.10→v3.2.0 Move the Forensic Needle? (researcher-robbie)
Author: Researcher Robbie (CaaCS quantitative data engine — DELTA / before-after lens).
Branch: design/naming-identity-ssot-alignment @ 3.2.0 (read-only; no commit, no branch switch).
Range: v3.1.10 (6975ee2, 2026-06-04) .. v3.2.0 (40e5209, 2026-06-16) — 2317 commits, 2160 non-merge.
Companion to: the 3-POV squad (corroboration-{priti,alphonso,paula}-*.md) and the static
CaaCS snapshot (naming-identity-ssot-strangler/caacs-*.md). Their lenses were static (the
shape at HEAD); this note is the temporal SHIFT — what the forensic metrics looked like
before the strangle vs after.
Directives & tactic applied (governance)
- DIRECTIVE_003 — Decision Documentation Requirement. Every number below carries the exact
git show/git log/radon cc -s -a/git grepcommand that produced it, reproducible on this branch. No verdict is asserted without a traceable before→after pair. - Tactic
forensic-repository-audit(CaaCS, after Tornhill's Your Code as a Crime Scene). Steps applied: (1) exclusion scope, (2) churn hotspots, (6) bug-hotspot, (7) complexity overlay (radon cc, sinceclocabsent — Python-only scope makeswc -lSLOC acceptable), and the change-coupling (co-change) extension. Adapted to a DELTA frame: every metric is computed at both tags. Failure modes honoured: no rename-following in bulk recipes (flagged per file where it bites); complexity overlay is mandatory (raw churn does not measure hardness); the v3.2.0 release is squash/rebase-divergent from v3.1.10 (git merge-base --is-ancestorreturns false) so the in-range churn is the rebased-replay history, not a clean linear ancestry — counts are directional and I note where the squash collapses signal. - Modes: investigation (mine the two tag boundaries) + synthesis (reconcile with the squad).
- Avoidance boundary: I supply the empirical before/after dataset; I do not adjudicate code-design correctness (alphonso/randy) nor make the final release call.
Exclusion list (tactic step 1): lockfiles, __pycache__, generated agent dirs (.claude/
etc., naturally outside src/), JSONL/JSON mission state. Surface is hand-curated Python source.
The before/after metric table — authorities vs consumers
Complexity via radon cc -s -a on the blob at each tag (git show <tag>:<path> | radon); SLOC via
wc -l; in-range churn via git log v3.1.10..v3.2.0 -- <path>. maxCC = worst single block
(ruff/Sonar ceiling is 15). Δ columns: SLOC and maxCC change. Decimal points normalised from the
locale comma.
Authority modules (should COOL if strangling works)
| File | v3.1.10 SLOC / avgCC / maxCC | v3.2.0 SLOC / avgCC / maxCC | ΔSLOC | ΔmaxCC | in-range churn (commits, +/−) | Verdict |
|---|---|---|---|---|---|---|
lanes/branch_naming.py |
321 / 3.3 / 8 | 844 / 2.8 / 8 | +523 | 0 | 7 · +541/−18 | COOL (grew in size as the grammar centralised, but avgCC fell 3.3→2.8 and maxCC held at 8 — pure additive primitives, no hot block) |
mission_runtime/context.py (net-new; predecessor core/execution_context.py) |
287 / 5.0 / 17 (as exec_context) | 274 / 1.5 / 3 | −13 | −14 | 2 · +290/−15 (new) ; predecessor 9 · +86/−368 | COOL — strongest (the authority was re-extracted simpler: avgCC 5.0→1.5, maxCC 17→3; predecessor hollowed by −368 lines into a shim) |
mission_runtime/resolution.py (net-new) |
— | 815 / 4.2 / 11 | (born) | (born ≤15) | 5 · +899/−83 | BORN-COLD (net-new authority, maxCC 11 under the 15 ceiling at birth) |
missions/_read_path_resolver.py (net-new) |
— | 424 / 3.2 / 8 | (born) | (born) | 8 · +499/−74 | BORN-COLD (maxCC 8, avgCC 3.2) |
coordination/surface_resolver.py (net-new) |
— | 568 / 3.2 / 14 | (born) | (born) | 8 · +634/−65 | BORN-COLD (maxCC 14 — at the ceiling but under it) |
coordination/types.py (net-new) |
— | 160 / 1.2 / 2 | (born) | (born) | 4 · +166/−5 | BORN-COLD (value-object module, avgCC 1.2) |
ownership/validation.py |
261 / 5.6 / 12 | 384 / 4.8 / 12 | +123 | 0 | 7 · +146/−23 | COOL (avgCC 5.6→4.8, maxCC held at 12) |
status/emit.py |
479 / 5.5 / 23 | 852 / 7.4 / 40 | +373 | +17 | 18 · +560/−187 | HEAT (the one authority that warmed: avgCC ↑, maxCC 23→40, busiest authority by churn) |
status/reducer.py |
213 / 5.0 / 14 | 345 / 7.2 / 24 | +132 | +10 | 6 · +160/−28 | HEAT (mild) (maxCC 14→24 — crossed the ceiling) |
status/transitions.py |
358 / 5.3 / 14 | 131 / 3.4 / 6 | −227 | −8 | 4 · +115/−342 | COOL — clean (the matrix was extracted out: −227 SLOC, maxCC 14→6) |
Consumer hotspots (should HEAT if under-adopted)
| File | v3.1.10 SLOC / avgCC / maxCC | v3.2.0 SLOC / avgCC / maxCC | ΔSLOC | ΔmaxCC | in-range churn (commits, +/−, bugfix) | Verdict |
|---|---|---|---|---|---|---|
cli/commands/implement.py |
678 / 8.4 / 34 | 1355 / 6.9 / 57 | +677 | +23 | 34 · +1013/−336 · 28 bugfix | HEAT (doubled in size, maxCC 34→57; avgCC fell only because volume diluted it) |
cli/commands/merge.py |
1218 / 11.6 / 60 | 3340 / 7.9 / 102 | +2122 | +42 | 52 · +2791/−668 · 43 bugfix | HEAT — severe (2.7× SLOC, maxCC 60→102) |
cli/commands/agent/tasks.py |
3041 / 15.9 / 118 | 4539 / 11.9 / 178 | +1498 | +60 | 53 · +2020/−522 · 47 bugfix | HEAT — severe (already a god-module; maxCC 118→178) |
cli/commands/agent/workflow.py |
1816 / 13.2 / 71 | 2731 / 8.5 / 84 | +915 | +13 | 54 · +1789/−852 · 46 bugfix | HEAT (+50% SLOC, maxCC 71→84) |
cli/commands/agent/mission.py |
2165 / 11.6 / 158 | 3939 / 10.5 / 220 | +1774 | +62 | 60 · +2608/−833 · 46 bugfix | HEAT — severe (1.8× SLOC, maxCC 158→220, highest churn of the whole surface) |
dashboard/scanner.py |
784 / 5.8 / 21 | 891 / 5.6 / 22 | +107 | +1 | 11 · +322/−130 · 10 bugfix | STABLE (the one consumer that did NOT heat materially — small, contained) |
Reading the table (the load-bearing before/after):
- Every extracted authority is COOL or BORN-COLD by max-block complexity — except the
status/write-side (emit/reducerHEAT).mission_runtime/context.pyis the cleanest proof the extraction worked: the same authority wentavgCC 5.0→1.5, maxCC 17→3while the old home (core/execution_context.py) was hollowed by −368 lines into a re-export shim.status/transitions.pylost 227 SLOC as the matrix was pulled out (maxCC 14→6). - Every consumer god-module HEATED hard — 4 of 6 grew their worst block past it:
merge60→102,tasks118→178,mission158→220. They grew +677…+2122 SLOC each and carry 28–47 bugfix commits apiece in-range. The danger migrated out of the seam and into the un-consolidated callers — exactly the squad's "extract-then-under-adopt". - The two exceptions are diagnostic.
status/emitHEATED as an authority (the write-side is the live coordination strangler the goals-doc defers to 3.3.x — so its warming is scheduled, not drift).dashboard/scanner.pystayed STABLE — the one consumer small enough to carry its inlinemid8without bloating.
Coupling-shift finding (Question 3)
Change-coupling = co-change pairs among the strangled surface, mined with
git log --name-only --pretty=format:'@%H' + an AWK pair-counter. The two windows are
volume-comparable (consumer-cluster commits: 166 before vs 159 in-range), so per-commit
density is a fair before/after.
| Window | Σ consumer-pair co-changes | consumer-cluster commits | coupling density (pairs/commit) |
|---|---|---|---|
BEFORE (..v3.1.10) |
213 | 166 | 1.28 |
IN-RANGE (v3.1.10..v3.2.0) |
170 | 159 | 1.07 |
Per-commit consumer coupling dropped ≈16%. And the intensity profile flattened: the
single dominant pre-range edge (tasks↔workflow = 40 co-changes, the old whack-a-mole spine)
fell to 16 in-range, with no single pair exceeding 18. So the tight 2-file ping-pong loosened.
But the coupling did NOT dissolve — it broadened. In-range the top edges spread across the
whole orchestrator quad fairly evenly (tasks↔merge 18, mission↔workflow 18, mission↔tasks
16, tasks↔workflow 16, tasks↔implement 15) and agent/mission.py entered the hot cluster
(absent pre-range top-12, now in 3 of the top edges). Net: the coupling moved from one fragile
edge to a diffuse 5-node clique. Authority↔consumer coupling is the encouraging signal —
consumers now co-change with status/emit (8×), surface_resolver (6×), _read_path_resolver
(6×): they are starting to move with the seams, which is what adoption looks like mid-flight.
Verdict: coupling-density DROPPED (good), but the absolute co-change among the un-consolidated orchestrators PERSISTS as a diffuse clique (the adoption gap).
The ratchet law, quantified (Question 4)
Occurrence counts across src/**/*.py via git grep -E <pattern> <tag>. This is the empirical
test of the squad's law: "what is ratcheted shrinks; what is not, grows."
| Idiom | Ratcheted? | v3.1.10 | v3.2.0 | Δ | Reading |
|---|---|---|---|---|---|
from specify_cli.status.<sub> deep-imports (status boundary bypass) |
YES (test_status_module_boundary.py, shrinking allowlist) |
182 | 43 | −139 (−76%) | RATCHETED → SHRANK HARD. The single clearest "needle moved" number in the dataset. |
status/transitions.py matrix SLOC |
YES (extracted behind facade) | 358 | 131 | −227 SLOC | RATCHETED → SHRANK. |
kitty/mission… f-strings (worktree-name guard scope) |
PARTIAL (test_no_worktree_name_guess.py) |
42 | 63 | +21 | Mixed — the guarded subset shrank (squad: 7→5) but the broad token grew as the grammar spread. |
mission_id[:8] (mid8 hand-roll) |
NO literal-ban ratchet (only 2 incidental test refs; neither a ban) | 4 | 26 | +22 (+550%) | UN-RATCHETED → GREW. 7 of the 26 are inside mission_runtime (legit authority-internal); ~19 are consumer hand-rolls. |
[:8] (any 8-slice) |
NO | 15 | 46 | +31 | UN-RATCHETED → GREW. |
parents[2] (path re-derive) |
NO production ban (38 test refs are fixture math, not a guard) | 4 | 10 | +6 (+150%) | UN-RATCHETED → GREW. |
The counter-evidence that proves the extraction is real, not cosmetic — the SSOT vocabulary spread in lockstep with (and to the same magnitude as) the hand-roll it is meant to replace:
| Canonical SSOT surface | v3.1.10 | v3.2.0 | Δ |
|---|---|---|---|
mid8 token (the SSOT grammar) |
67 | 605 | +538 (9×) |
resolve_mid8 (canonical mid8 helper) |
0 | 26 | born + adopted to parity with the 26 raw mission_id[:8] |
resolve_action_context (context SSOT entrypoint) |
3 | 27 | +24 (9×) |
branch_naming SSOT import (consumers routing to the seam) |
14 | 64 | +50 (4.6×) |
Quantified ratchet law — CORROBORATED. The one class that got a shrinking-allowlist ratchet
(status boundary imports) collapsed 182→43. Every class left un-ratcheted (mission_id[:8]
+550%, parents[2] +150%, [:8] +207%) grew. Meanwhile the canonical replacements were built and
adopted at scale (mid8 9×, resolve_mid8 0→26, resolve_action_context 9×, branch_naming
import 4.6×) — so this is co-existence, not failure: the SSOT is in place and climbing; the
old idiom persists only where no ratchet forces the cut. The law is not a metaphor here — it is
the literal mechanism separating the shrinking surfaces from the growing ones.
Per-goal forensic verdict (Question 5)
| Goal | Forensic SHIFT evidence | Verdict |
|---|---|---|
| G1 Doctrine→runtime depth | Out of direct scope for this surface set, but the substrate cooled: mission_runtime/{context,resolution} born-cold (maxCC 3/11) give the next loop a clean context authority to render doctrine against. No authority for G1's own seam heated. |
SUPPORTED-for-direction (substrate cooled; not the focus of this delta) |
| G2 Core-domain strangler → SSOT | The headline corroboration. Authorities COOL/born-cold (context maxCC 17→3; transitions −227 SLOC; 4 net-new authorities born ≤ ceiling), consumers HEAT (merge maxCC 60→102, mission 158→220), and the canonical grammar spread 9× (mid8 67→605). The extract→route arc is visible in the metric shift, not just the ADRs. |
SUPPORTED — strongest (the before/after shift directly shows extract-then-under-adopt) |
| G3 DevEx & enablers | The ratchet that exists (status boundary) moved the needle 76% (182→43) — empirical proof the ratchet enabler works. The gap is enabler coverage: no [:8]/parents[2] literal-ban, so those grew. G3's job for 3.2.x is to extend the ratchet wall to the un-guarded idioms. |
SUPPORTED — with a named coverage gap |
Biggest un-closed adoption gap
It is NOT status. The squad/alphonso flagged status' ~245 bypass imports as the worry; the
delta shows status is the domain that visibly CLOSED — deep-imports 182 → 43 (−76%) under
its module-boundary ratchet, and transitions.py shed 227 SLOC. Status is the success story of
this range.
The biggest un-closed gap is IDENTITY / mid8 derivation. It is the only domain where the
authority is fully built and named (resolve_mid8 0→26, mid8 grammar 67→605) yet the raw
hand-roll grew in lockstep (mission_id[:8] 4→26, +550%) because it has no literal-ban
ratchet (confirmed: only 2 incidental test references, neither a guard). The SSOT and its
bypass now sit at numerical parity (26 vs 26) with nothing forcing the cut. That is the
canonical "extract-then-under-adopt" frozen in the metrics — and the cheapest, highest-ROI
3.2.x action is a mission_id[:8] literal-ban ratchet pointed at resolve_mid8, which would
flip identity from the worst gap to a closing one exactly as the status boundary ratchet did.
Overall forensic corroboration verdict
The v3.1.10→v3.2.0 strangling DID move the forensic needle, asymmetrically and exactly as a mid-flight strangler-fig should. Authorities cooled or were born cold; consumers heated; the one domain with a real ratchet (status) collapsed its bypass surface 76%; the domains without one (identity, path-derivation) grew their bypass even as the SSOT was built beside it. The shift is real and directional (the extraction worked), incomplete (adoption lags), and self-limiting (the ratchet mechanism is proven to close gaps where applied) — which is the empirical signature of "extract authority → route consumers, even if adoption lags," not drift.
Reproduction (every command, per DIRECTIVE_003)
RADON=/home/stijn/.pyenv/versions/3.13.12/bin/radon
# complexity+SLOC at a tag:
git show v3.1.10:src/specify_cli/cli/commands/merge.py | $RADON cc -s -a - # avg + per-block CC
git show v3.1.10:src/specify_cli/cli/commands/merge.py | wc -l # SLOC
# in-range churn (squash/rebase-divergent range — directional):
git log --oneline v3.1.10..v3.2.0 -- <path> | wc -l
git log --numstat --format='' v3.1.10..v3.2.0 -- <path> # +/- lines
# occurrence delta (ratchet law):
git grep -h -E 'mission_id\[:8\]' v3.1.10 -- 'src/**/*.py' | wc -l
git grep -h -E 'mission_id\[:8\]' v3.2.0 -- 'src/**/*.py' | wc -l
# status boundary bypass:
git grep -h -E 'from specify_cli\.status\.[a-z]' v3.1.10 -- 'src/**/*.py' | wc -l # 182
git grep -h -E 'from specify_cli\.status\.[a-z]' v3.2.0 -- 'src/**/*.py' | wc -l # 43
# change-coupling: git log --name-only --pretty=format:'@%H' <range> -- <paths> | awk <pair-counter>
Caveats (tactic failure-modes): (1) v3.2.0 is squash/rebase-divergent from v3.1.10
(git merge-base --is-ancestor v3.1.10 v3.2.0 = false), so in-range churn is the rebased-replay
history — directional, and the squash collapses some per-file granularity. (2) bulk recipes do
not follow renames; mission_runtime/context.py is tracked against its predecessor
core/execution_context.py manually (the −368-line hollowing is the predecessor's in-range delta).
(3) Conventional-Commits inflates raw bugfix %, so the absolute bugfix counts are reported, not
densities. (4) radon mi floors at 0.0 for the largest modules; maxCC is the load-bearing
complexity series here, not MI.