RED TEAM — Debugger Debbie: Refuting the 3.2.x Benefit / ROI Claims

Author: Debugger Debbie (RED TEAM — dialectical antithesis). Branch: design/naming-identity-ssot-alignment @ 3.2.0 (read-only; no commit/switch). Date: 2026-06-16. Directives applied (debugger-debbie.agent.yaml): D-001 (find the owning boundary / structural fork — applied adversarially, to ask whether threading moves the fork rather than closing it), D-003 (persist falsified + surviving hypotheses so they cannot be re-litigated), D-030 (interrogate whether the ratchet is a real producer-conformance gate or theater), D-032 (the divergence-matrix lens — applied to ask whether threading creates a new divergence).

Posture. The white team (Pedro feasibility, Alphonso design-verdict, Paula CaaCS, Robbie delta) CONFIRMED: "the recurring defect class is inline re-derivation; threading the Context + extending the ratchet will ELIMINATE it; ROI is high." I argue the ANTITHESIS. I concede where the evidence forces me — and it forces me on two points. But the headline benefit claim — threading eliminates the class and the ROI is high — does not survive contact with the actual bug commits.


TL;DR — the six-line verdict

  1. Strongest benefit-refutation: the consumer-complexity ROI argument is built on merge.py (CC 60→102, +2122 SLOC) — but only 2 of its 52 in-range commits touch naming/mid8; 12 are merge feature work (preflight/resume/conflict/atomicity), and merge.py holds zero resolve_action_context references, so it is not a threading candidate at all. The headline "cost of not consolidating" is feature growth threading cannot shrink.
  2. Re-derivation is NOT the root cause of the worst recurrences. #1589, #1949, and the #1990↔#1991 mirror pair were wrong-logic-in-the-authority bugs (strip-vs-verbatim, double-suffix, wrong-surface-selection). The single SSOT had the wrong logic too and got fixed inside the seam (branch_naming.py took +541 lines of fixes). Threading would not have prevented a single one of them.
  3. The #1589 regression is the disproof, not the proof. It recurred after the seam existed and after routing onto it (7fbe47c65, "cycle 1, #1589 regression") — because the canonical seam applied the wrong twin (strip where verbatim was needed). Consolidation carried the bug to a new callsite; it did not stop it.
  4. The ratchet is whack-a-mole + false security. It keys on the literal token mid8 (_MID8_TOKEN_RE = re.compile(r"\bmid8\b")) and 3 hard-coded AST idioms. It passes today with 14 live bare mission_id[:8] sites in the tree — provably blind to the most-recurring class's most common form. Banning mid8[:8] re-spawns the class as mid[:8], raw_mid[:8], str(x)[0:8], a helper, or a comprehension (Robbie: the un-ratcheted form grew +550%).
  5. The cure re-creates the disease, silently. ExecutionContext is a mutable @dataclass (line 184) with a flat substrate consumers still read; threading the wrong fragment (meta/primary vs status/coord vs lanes/coord) reproduces the genesis split-brain (implement.py:1009-1018) and is byte-identical under flattened topology — it fails far from its cause. Re-derivation is at least self-contained and locally testable.
  6. Net ROI verdict: LOW-to-NEGATIVE for "threading," MODEST-POSITIVE for the static mid8() routing alone. Threading converges on exactly 2 sites that already hold a context; the other ~12 take the static fix anyway. The benefit claim conflates a cheap 2-site routing win with an expensive "thread the Context everywhere" narrative that the bug history does not pay for.

Refutation matrix

# Claimed benefit (white team) Counter-case (Debbie) Bug / commit evidence Holds / survives? Severity
R1 "The recurring class is inline re-derivation; the SSOT removes it." The worst recurrences were wrong logic in the authority, not duplicated derivation. A single SSOT had the same wrong logic and was fixed inside itself. e2c12bd14 adds _idempotent_legacy_body to the seam (#1949 double-mid8); 7fbe47c65 fixes strip-vs-verbatim in the seam + composers (#1589); branch_naming.py took +541/−18 in-range (Robbie table). Claim FALSIFIED for the worst recurrences. Survives only for the trivial [:8] slice class. HIGH
R2 "Threading the Context eliminates the class." Threading touches 2 sites (implement.py:386, agent/mission.py:772). The other ~12 bare-[:8] sites + every bulk consumer hold no context and take the static mid8() fix — which is the actual cure, not threading. grep: 8 resolve_action_context call sites total; only implement.py/agent.mission.py re-derive [:8] while holding a context. Pedro himself: "adoption is ~5% done … only the 4 action-scoped orchestrators are threading candidates." "Threading eliminates the class" FALSIFIED. The static mid8() routing eliminates it; threading is a 2-site nicety. HIGH
R3 "Threading is lower-risk / byte-identical (safe)." The Context is a mutable @dataclass with a live flat substrate; threading the wrong fragment re-creates the genesis split-brain and is byte-identical under flattened topology → silent, fails far from cause. The cure is buggier than the self-contained re-derivation. context.py:184 @dataclass (not frozen) vs fragments frozen; genesis bug implement.py:1009-1018 (slug-derived empty mid8 read wrong surface); Pedro's own caveat §5.1 ("a value match is byte-identical under flattened topology and would pass a naive test while masking the wrong-surface bug"). Risk claim SURVIVES as a NEW risk class. Threading introduces a silent wrong-fragment failure mode re-derivation does not have. HIGH
R4 "Extend the ratchet → the class can never regrow ('what is ratcheted shrinks')." The ratchet is a literal-shape matcher (\bmid8\b + 3 AST idioms). It passes right now with 14 un-routed mission_id[:8] sites. Ban one shape, the class reappears in another (mid[:8], raw_mid[:8], [0:8], helper, comprehension). Whack-a-mole + allow-list debt + false confidence. test_no_worktree_name_guess.py:75 _MID8_TOKEN_RE; pytest …3 passed with 14 live [:8] sites present; Robbie: mission_id[:8] 4→26 (+550%), [:8] 15→46, parents[2] 4→10 — all grew while "ratcheted." Whack-a-mole VERDICT CONFIRMED. The "shrinking allow-list" oracle is real only for the one literal shape it encodes; the class lives in the shapes it does not. HIGH
R5 "Consumer complexity (merge 60→102) is the cost of not consolidating; threading shrinks it." merge.py's growth is feature work, not naming. 2/52 commits naming; 12/52 merge-feature; 0 context references → not a threading candidate. Threading cannot touch it; consolidation pays a large refactor cost to shrink complexity it does not own. git log v3.1.10..v3.2.0 -- merge.py: 52 commits, 2 naming, 12 preflight/resume/conflict/atomic; grep resolve_action_context merge.py0; Robbie table: merge +2122 SLOC · 43 bugfix. ROI claim FALSIFIED at its headline exemplar. The cited complexity is unrelated to the cure. HIGH
R6 "It's just non-adoption; route the consumers." This is the convenient story. It lets the team defer the hard problems the same notes admit are the real disease: the WRITE/entry side (#1878), the coord/primary topology decision, and the genesis surface-selection. "Route the consumers" addresses the cheap read-slice; the expensive bugs (#1718, #1772 across ≥4 missions, #1990↔#1991, the two topology-flatten workarounds) are write/topology, explicitly out of scope. Overview §6 non-goals: "#1878 write-side strangler … separate follow-on mission"; Paula Class 2: #1772 re-touched ≥4 missions, team flattened topology twice (92b5b3f85, 40ad64222) as a workaround. Anti-laziness lens CONFIRMED. The benefit is claimed for the slice that does not contain the recurring high-blast-radius bugs. MEDIUM-HIGH
R7 "mid8 single-derivation invariant makes threading provably safe." The __post_init__ guard proves mid8 == mission_id[:8] — i.e. it only protects against a derivation divergence the bugs were never about. It does nothing for the wrong-surface / strip-vs-verbatim / topology bugs that are the actual recurrences. A guard that proves the easy thing and is silent on the hard thing is false assurance. context.py:98 __post_init__ raises on mid8 != mission_id[:8]; but #1589/#1990/#1991/genesis are surface-selection, not slice-value, bugs. SURVIVES as scope-mismatch. The invariant is correct and irrelevant to the recurring class's substance. MEDIUM

The causation forensics — is re-derivation the root cause? (R1/R2 in detail)

The white team's central claim is causal: re-derivation causes the bugs. I tested it against the actual fix commits. The recurring class decomposes into two sub-classes with different roots:

Sub-class A — wrong-surface / wrong-twin / wrong-compose LOGIC (the high-severity, recurring one)

These are the bugs that recurred across missions and were named after a bug class (01KTYGTE "name-vs-authority remediation"). Their root cause is wrong logic about which name/surface is correct — and that logic is wrong wherever it lives, including inside a single SSOT:

  • #1949 double-mid8 (<slug>-<mid8>-<mid8>): the fix is _idempotent_legacy_body added to the seam (e2c12bd14). The bug was non-idempotent composition logic. A single composer with non-idempotent logic produces the same double-suffix. Threading a value object does not make a composer idempotent — fixing the composer does, and that fix is in the authority.
  • #1589 strip-vs-verbatim — the disproof. Fixed 2026-06-01; re-fixed as a regression 2026-06-15 (7fbe47c65: "coordination composers reconstruct names verbatim (no NNN-strip) … (cycle 1, #1589 regression)"). This regression happened with the seam in place — the canonical mission_dir_name strips NNN-, and that stripping twin was wired where the verbatim twin was needed. Consolidation carried the wrong behavior to a new callsite. A single authority with a strip: bool choice is exactly where this re-bit. (White team's own guardrail admits this: "Merging behind a strip: bool flag re-creates the #1589 class.")
  • #1990 ↔ #1991 mirror pair (same day, 5fdb26bb7 / d3fdbc556): one callsite read primary when it should read coord; the sibling read coord when it should read primary. Two opposite wrong-surface bugs. If you thread a Context carrying three surfaces and the consumer reads the wrong fragment, you reproduce both bugs — now silently, because under a flattened topology the fragments are byte-identical. Threading relocates the decision; it does not make it correct.

Verdict on causation: for sub-class A, the root cause is wrong logic in the authority, not re-derivation. The fixes landed inside branch_naming.py / the composers (+541 lines), which is exactly what you would not expect if duplication-per-se were the cause. Threading would have prevented none of A.

Sub-class B — bare [:8] hand-roll (the low-severity, ratchet-target one)

This is genuine duplication — ~14 sites slicing mission_id[:8]. But:

  • It is not the bug class that recurred across missions — it is a tidiness debt. Paula grades Shape A as "#1 most defect-generative" but Shape A is the compose logic (sub-class A), not the [:8] slice. The [:8] slice's only cited harm is the potential to regrow, not a shipped P1.
  • The cure for B is mid8() routing (the static slice), which the white team already scoped. Threading is irrelevant to ~12 of the 14 B-sites (they hold no context). So "threading eliminates the class" overstates a 2-site improvement on the least harmful sub-class.

Net: the benefit-bearing claim ("threading eliminates the recurring class") attaches the expensive intervention (threading) to the sub-class it cannot fix (A) and over-credits it on the sub-class a one-line mid8() swap already fixes (B).


The ratchet — false security in three movements (R4/D-030)

  1. It is a literal-shape matcher. _MID8_TOKEN_RE = re.compile(r"\bmid8\b") + three AST idioms (endswith(f"-{mid8}"), kitty/mission- f-string, f"{slug}-{mid8}" compose). Anything that does not spell mid8 or match those exact shapes is invisible to it.
  2. It is provably blind right now. pytest tests/architectural/test_no_worktree_name_guess.py3 passed — with 14 live mission_id[:8] sites in src/ (status/aggregate.py:250, doctrine_synthesizer/apply.py:745,831, implement.py:386, doctor.py:3070,3162, agent/workflow.py:292, agent/mission.py:772, git/sparse_checkout.py:286, context/mission_resolver.py:163, dashboard/scanner.py:438, …). The most-recurring class's most common form passes the gate today.
  3. Extending it is whack-a-mole with allow-list debt. Robbie's own data: the un-ratcheted mission_id[:8] grew +550% while the SSOT was built beside it. Add a [:8] ban and the class re-spawns as mid[:8] (already live at workflow.py:292), raw_mid[:8] (agent/mission.py:772), str(mission_id)[0:8], a private helper, or a slice inside a comprehension — none of which the regex sees. Each new shape needs a new ratchet rule + the "narrowly-justified" allow-list grows. The "shrinking allow-list as completeness oracle" is a completeness oracle only for the shapes already encoded — a tautology, not a proof.

A producer-conformance gate (D-030) that passes with 14 live instances of the class it claims to guard is theater until extended, and an arms race after. It buys false confidence — the most dangerous kind, because a green ratchet invites the next author to assume the class is closed.


Concessions (where the benefit is real — D-003, honest)

I am a red team, not a denier. Two points the evidence forces me to concede:

  • C1 — the static mid8() routing has genuine, cheap ROI. Routing the ~14 bare-[:8] sites through one mid8() authority is a real, low-cost consolidation that removes a real (if low-severity) duplication. I concede this slice is worth doing. My refutation is that it is not "threading," and it fixes sub-class B, not the recurring sub-class A.
  • C2 — the extraction demonstrably worked where a real ratchet existed. Robbie's status boundary number (deep-imports 182→43, −76%) is hard evidence that a shrinking-allow-list ratchet can move the needle. I concede the mechanism works when the guard matches the actual bypass shape. That is precisely why the mid8 literal-ban will under-perform it: the status guard matched a stable import shape; the mid8 class is shape-shifting, so the same mechanism will leak. (The concession sharpens the refutation, it does not soften it.)

Surviving hypotheses (persisted per D-003 so they are not re-litigated)

  • H-SURVIVES-1: Threading prevents 0 of the cross-mission recurrences (#1589, #1949, #1990/#1991, genesis). They are authority-logic bugs; threading relocates the decision.
  • H-SURVIVES-2: The ratchet, even extended, is a literal-shape arms race; it cannot prove the class closed, only that the encoded shapes are absent. It passes today with 14 live sites.
  • H-SURVIVES-3: Threading introduces a new silent failure mode (wrong-fragment, byte-identical under flattened topology) that self-contained re-derivation does not have.
  • H-SURVIVES-4: The cited consumer-complexity ROI (merge.py 60→102) is feature growth, not naming, and is not a threading candidate (0 context refs).
  • H-FALSIFIED-1 (mine): "Re-derivation per se is harmless" — falsified; the [:8] slice is real (if minor) duplication worth routing (C1).
  • H-FALSIFIED-2 (mine): "The ratchet mechanism never works" — falsified by the status 182→43 result (C2); it works when the guard shape is stable.

Bottom line (net ROI)

The white team is right that an SSOT exists and that consumers re-derive. They are wrong that threading the Context will eliminate the recurring class and that the ROI is high.

  • The recurring, cross-mission, high-blast-radius bugs (sub-class A) are authority-logic bugs; threading prevents none — the proof is that the #1589 regression recurred after the seam, via the seam.
  • "Threading" converges on 2 sites; the actual cure for the duplication is a one-line mid8() swap at ~14 sites — cheap, worth doing, and mis-labelled as threading.
  • The ratchet passes today with 14 live instances of the class it guards; extending it is an arms race that buys false confidence.
  • The headline ROI exemplar (merge.py complexity) is feature work threading cannot touch.

Net verdict: Do the cheap static mid8()/composer routing (modest-positive ROI). Do NOT sell it as a "thread-the-Context-everywhere" mission with a "can never regrow" ratchet — that narrative over-credits the cure, under-states a new silent risk, and defers the actual recurring disease (the write/topology side, #1878) under the convenient banner of "it's just non-adoption."