RED TEAM — Debugger Debbie: Refuting the 3.2.x Benefit / ROI Claims
Author: Debugger Debbie (RED TEAM — dialectical antithesis).
Branch: design/naming-identity-ssot-alignment @ 3.2.0 (read-only; no commit/switch).
Date: 2026-06-16.
Directives applied (debugger-debbie.agent.yaml): D-001 (find the owning boundary / structural
fork — applied adversarially, to ask whether threading moves the fork rather than closing it),
D-003 (persist falsified + surviving hypotheses so they cannot be re-litigated), D-030 (interrogate
whether the ratchet is a real producer-conformance gate or theater), D-032 (the divergence-matrix
lens — applied to ask whether threading creates a new divergence).
Posture. The white team (Pedro feasibility, Alphonso design-verdict, Paula CaaCS, Robbie delta) CONFIRMED: "the recurring defect class is inline re-derivation; threading the Context + extending the ratchet will ELIMINATE it; ROI is high." I argue the ANTITHESIS. I concede where the evidence forces me — and it forces me on two points. But the headline benefit claim — threading eliminates the class and the ROI is high — does not survive contact with the actual bug commits.
TL;DR — the six-line verdict
- Strongest benefit-refutation: the consumer-complexity ROI argument is built on
merge.py(CC 60→102, +2122 SLOC) — but only 2 of its 52 in-range commits touch naming/mid8; 12 are merge feature work (preflight/resume/conflict/atomicity), andmerge.pyholds zeroresolve_action_contextreferences, so it is not a threading candidate at all. The headline "cost of not consolidating" is feature growth threading cannot shrink. - Re-derivation is NOT the root cause of the worst recurrences. #1589, #1949, and the
#1990↔#1991 mirror pair were wrong-logic-in-the-authority bugs (strip-vs-verbatim,
double-suffix, wrong-surface-selection). The single SSOT had the wrong logic too and got
fixed inside the seam (
branch_naming.pytook +541 lines of fixes). Threading would not have prevented a single one of them. - The #1589 regression is the disproof, not the proof. It recurred after the seam
existed and after routing onto it (
7fbe47c65, "cycle 1, #1589 regression") — because the canonical seam applied the wrong twin (strip where verbatim was needed). Consolidation carried the bug to a new callsite; it did not stop it. - The ratchet is whack-a-mole + false security. It keys on the literal token
mid8(_MID8_TOKEN_RE = re.compile(r"\bmid8\b")) and 3 hard-coded AST idioms. It passes today with 14 live baremission_id[:8]sites in the tree — provably blind to the most-recurring class's most common form. Banningmid8[:8]re-spawns the class asmid[:8],raw_mid[:8],str(x)[0:8], a helper, or a comprehension (Robbie: the un-ratcheted form grew +550%). - The cure re-creates the disease, silently. ExecutionContext is a mutable
@dataclass(line 184) with a flat substrate consumers still read; threading the wrong fragment (meta/primary vs status/coord vs lanes/coord) reproduces the genesis split-brain (implement.py:1009-1018) and is byte-identical under flattened topology — it fails far from its cause. Re-derivation is at least self-contained and locally testable. - Net ROI verdict: LOW-to-NEGATIVE for "threading," MODEST-POSITIVE for the static
mid8()routing alone. Threading converges on exactly 2 sites that already hold a context; the other ~12 take the static fix anyway. The benefit claim conflates a cheap 2-site routing win with an expensive "thread the Context everywhere" narrative that the bug history does not pay for.
Refutation matrix
| # | Claimed benefit (white team) | Counter-case (Debbie) | Bug / commit evidence | Holds / survives? | Severity |
|---|---|---|---|---|---|
| R1 | "The recurring class is inline re-derivation; the SSOT removes it." | The worst recurrences were wrong logic in the authority, not duplicated derivation. A single SSOT had the same wrong logic and was fixed inside itself. | e2c12bd14 adds _idempotent_legacy_body to the seam (#1949 double-mid8); 7fbe47c65 fixes strip-vs-verbatim in the seam + composers (#1589); branch_naming.py took +541/−18 in-range (Robbie table). |
Claim FALSIFIED for the worst recurrences. Survives only for the trivial [:8] slice class. |
HIGH |
| R2 | "Threading the Context eliminates the class." | Threading touches 2 sites (implement.py:386, agent/mission.py:772). The other ~12 bare-[:8] sites + every bulk consumer hold no context and take the static mid8() fix — which is the actual cure, not threading. |
grep: 8 resolve_action_context call sites total; only implement.py/agent.mission.py re-derive [:8] while holding a context. Pedro himself: "adoption is ~5% done … only the 4 action-scoped orchestrators are threading candidates." |
"Threading eliminates the class" FALSIFIED. The static mid8() routing eliminates it; threading is a 2-site nicety. |
HIGH |
| R3 | "Threading is lower-risk / byte-identical (safe)." | The Context is a mutable @dataclass with a live flat substrate; threading the wrong fragment re-creates the genesis split-brain and is byte-identical under flattened topology → silent, fails far from cause. The cure is buggier than the self-contained re-derivation. |
context.py:184 @dataclass (not frozen) vs fragments frozen; genesis bug implement.py:1009-1018 (slug-derived empty mid8 read wrong surface); Pedro's own caveat §5.1 ("a value match is byte-identical under flattened topology and would pass a naive test while masking the wrong-surface bug"). |
Risk claim SURVIVES as a NEW risk class. Threading introduces a silent wrong-fragment failure mode re-derivation does not have. | HIGH |
| R4 | "Extend the ratchet → the class can never regrow ('what is ratcheted shrinks')." | The ratchet is a literal-shape matcher (\bmid8\b + 3 AST idioms). It passes right now with 14 un-routed mission_id[:8] sites. Ban one shape, the class reappears in another (mid[:8], raw_mid[:8], [0:8], helper, comprehension). Whack-a-mole + allow-list debt + false confidence. |
test_no_worktree_name_guess.py:75 _MID8_TOKEN_RE; pytest … → 3 passed with 14 live [:8] sites present; Robbie: mission_id[:8] 4→26 (+550%), [:8] 15→46, parents[2] 4→10 — all grew while "ratcheted." |
Whack-a-mole VERDICT CONFIRMED. The "shrinking allow-list" oracle is real only for the one literal shape it encodes; the class lives in the shapes it does not. | HIGH |
| R5 | "Consumer complexity (merge 60→102) is the cost of not consolidating; threading shrinks it." | merge.py's growth is feature work, not naming. 2/52 commits naming; 12/52 merge-feature; 0 context references → not a threading candidate. Threading cannot touch it; consolidation pays a large refactor cost to shrink complexity it does not own. |
git log v3.1.10..v3.2.0 -- merge.py: 52 commits, 2 naming, 12 preflight/resume/conflict/atomic; grep resolve_action_context merge.py → 0; Robbie table: merge +2122 SLOC · 43 bugfix. |
ROI claim FALSIFIED at its headline exemplar. The cited complexity is unrelated to the cure. | HIGH |
| R6 | "It's just non-adoption; route the consumers." | This is the convenient story. It lets the team defer the hard problems the same notes admit are the real disease: the WRITE/entry side (#1878), the coord/primary topology decision, and the genesis surface-selection. "Route the consumers" addresses the cheap read-slice; the expensive bugs (#1718, #1772 across ≥4 missions, #1990↔#1991, the two topology-flatten workarounds) are write/topology, explicitly out of scope. | Overview §6 non-goals: "#1878 write-side strangler … separate follow-on mission"; Paula Class 2: #1772 re-touched ≥4 missions, team flattened topology twice (92b5b3f85, 40ad64222) as a workaround. |
Anti-laziness lens CONFIRMED. The benefit is claimed for the slice that does not contain the recurring high-blast-radius bugs. | MEDIUM-HIGH |
| R7 | "mid8 single-derivation invariant makes threading provably safe." | The __post_init__ guard proves mid8 == mission_id[:8] — i.e. it only protects against a derivation divergence the bugs were never about. It does nothing for the wrong-surface / strip-vs-verbatim / topology bugs that are the actual recurrences. A guard that proves the easy thing and is silent on the hard thing is false assurance. |
context.py:98 __post_init__ raises on mid8 != mission_id[:8]; but #1589/#1990/#1991/genesis are surface-selection, not slice-value, bugs. |
SURVIVES as scope-mismatch. The invariant is correct and irrelevant to the recurring class's substance. | MEDIUM |
The causation forensics — is re-derivation the root cause? (R1/R2 in detail)
The white team's central claim is causal: re-derivation causes the bugs. I tested it against the actual fix commits. The recurring class decomposes into two sub-classes with different roots:
Sub-class A — wrong-surface / wrong-twin / wrong-compose LOGIC (the high-severity, recurring one)
These are the bugs that recurred across missions and were named after a bug class (01KTYGTE "name-vs-authority remediation"). Their root cause is wrong logic about which name/surface is correct — and that logic is wrong wherever it lives, including inside a single SSOT:
- #1949 double-mid8 (
<slug>-<mid8>-<mid8>): the fix is_idempotent_legacy_bodyadded to the seam (e2c12bd14). The bug was non-idempotent composition logic. A single composer with non-idempotent logic produces the same double-suffix. Threading a value object does not make a composer idempotent — fixing the composer does, and that fix is in the authority. - #1589 strip-vs-verbatim — the disproof. Fixed 2026-06-01; re-fixed as a regression
2026-06-15 (
7fbe47c65: "coordination composers reconstruct names verbatim (no NNN-strip) … (cycle 1, #1589 regression)"). This regression happened with the seam in place — the canonicalmission_dir_namestripsNNN-, and that stripping twin was wired where the verbatim twin was needed. Consolidation carried the wrong behavior to a new callsite. A single authority with astrip: boolchoice is exactly where this re-bit. (White team's own guardrail admits this: "Merging behind astrip: boolflag re-creates the #1589 class.") - #1990 ↔ #1991 mirror pair (same day,
5fdb26bb7/d3fdbc556): one callsite read primary when it should read coord; the sibling read coord when it should read primary. Two opposite wrong-surface bugs. If you thread a Context carrying three surfaces and the consumer reads the wrong fragment, you reproduce both bugs — now silently, because under a flattened topology the fragments are byte-identical. Threading relocates the decision; it does not make it correct.
Verdict on causation: for sub-class A, the root cause is wrong logic in the authority, not
re-derivation. The fixes landed inside branch_naming.py / the composers (+541 lines), which is
exactly what you would not expect if duplication-per-se were the cause. Threading would have
prevented none of A.
Sub-class B — bare [:8] hand-roll (the low-severity, ratchet-target one)
This is genuine duplication — ~14 sites slicing mission_id[:8]. But:
- It is not the bug class that recurred across missions — it is a tidiness debt. Paula grades
Shape A as "#1 most defect-generative" but Shape A is the compose logic (sub-class A), not the
[:8]slice. The[:8]slice's only cited harm is the potential to regrow, not a shipped P1. - The cure for B is
mid8()routing (the static slice), which the white team already scoped. Threading is irrelevant to ~12 of the 14 B-sites (they hold no context). So "threading eliminates the class" overstates a 2-site improvement on the least harmful sub-class.
Net: the benefit-bearing claim ("threading eliminates the recurring class") attaches the
expensive intervention (threading) to the sub-class it cannot fix (A) and over-credits it on the
sub-class a one-line mid8() swap already fixes (B).
The ratchet — false security in three movements (R4/D-030)
- It is a literal-shape matcher.
_MID8_TOKEN_RE = re.compile(r"\bmid8\b")+ three AST idioms (endswith(f"-{mid8}"),kitty/mission-f-string,f"{slug}-{mid8}"compose). Anything that does not spellmid8or match those exact shapes is invisible to it. - It is provably blind right now.
pytest tests/architectural/test_no_worktree_name_guess.py→ 3 passed — with 14 livemission_id[:8]sites insrc/(status/aggregate.py:250,doctrine_synthesizer/apply.py:745,831,implement.py:386,doctor.py:3070,3162,agent/workflow.py:292,agent/mission.py:772,git/sparse_checkout.py:286,context/mission_resolver.py:163,dashboard/scanner.py:438, …). The most-recurring class's most common form passes the gate today. - Extending it is whack-a-mole with allow-list debt. Robbie's own data: the un-ratcheted
mission_id[:8]grew +550% while the SSOT was built beside it. Add a[:8]ban and the class re-spawns asmid[:8](already live atworkflow.py:292),raw_mid[:8](agent/mission.py:772),str(mission_id)[0:8], a private helper, or a slice inside a comprehension — none of which the regex sees. Each new shape needs a new ratchet rule + the "narrowly-justified" allow-list grows. The "shrinking allow-list as completeness oracle" is a completeness oracle only for the shapes already encoded — a tautology, not a proof.
A producer-conformance gate (D-030) that passes with 14 live instances of the class it claims to guard is theater until extended, and an arms race after. It buys false confidence — the most dangerous kind, because a green ratchet invites the next author to assume the class is closed.
Concessions (where the benefit is real — D-003, honest)
I am a red team, not a denier. Two points the evidence forces me to concede:
- C1 — the static
mid8()routing has genuine, cheap ROI. Routing the ~14 bare-[:8]sites through onemid8()authority is a real, low-cost consolidation that removes a real (if low-severity) duplication. I concede this slice is worth doing. My refutation is that it is not "threading," and it fixes sub-class B, not the recurring sub-class A. - C2 — the extraction demonstrably worked where a real ratchet existed. Robbie's status
boundary number (deep-imports 182→43, −76%) is hard evidence that a shrinking-allow-list
ratchet can move the needle. I concede the mechanism works when the guard matches the actual
bypass shape. That is precisely why the
mid8literal-ban will under-perform it: the status guard matched a stable import shape; the mid8 class is shape-shifting, so the same mechanism will leak. (The concession sharpens the refutation, it does not soften it.)
Surviving hypotheses (persisted per D-003 so they are not re-litigated)
- H-SURVIVES-1: Threading prevents 0 of the cross-mission recurrences (#1589, #1949, #1990/#1991, genesis). They are authority-logic bugs; threading relocates the decision.
- H-SURVIVES-2: The ratchet, even extended, is a literal-shape arms race; it cannot prove the class closed, only that the encoded shapes are absent. It passes today with 14 live sites.
- H-SURVIVES-3: Threading introduces a new silent failure mode (wrong-fragment, byte-identical under flattened topology) that self-contained re-derivation does not have.
- H-SURVIVES-4: The cited consumer-complexity ROI (
merge.py60→102) is feature growth, not naming, and is not a threading candidate (0 context refs). - H-FALSIFIED-1 (mine): "Re-derivation per se is harmless" — falsified; the
[:8]slice is real (if minor) duplication worth routing (C1). - H-FALSIFIED-2 (mine): "The ratchet mechanism never works" — falsified by the status 182→43 result (C2); it works when the guard shape is stable.
Bottom line (net ROI)
The white team is right that an SSOT exists and that consumers re-derive. They are wrong that threading the Context will eliminate the recurring class and that the ROI is high.
- The recurring, cross-mission, high-blast-radius bugs (sub-class A) are authority-logic bugs; threading prevents none — the proof is that the #1589 regression recurred after the seam, via the seam.
- "Threading" converges on 2 sites; the actual cure for the duplication is a one-line
mid8()swap at ~14 sites — cheap, worth doing, and mis-labelled as threading. - The ratchet passes today with 14 live instances of the class it guards; extending it is an arms race that buys false confidence.
- The headline ROI exemplar (merge.py complexity) is feature work threading cannot touch.
Net verdict: Do the cheap static mid8()/composer routing (modest-positive ROI). Do NOT sell
it as a "thread-the-Context-everywhere" mission with a "can never regrow" ratchet — that narrative
over-credits the cure, under-states a new silent risk, and defers the actual recurring disease (the
write/topology side, #1878) under the convenient banner of "it's just non-adoption."