spec-kitty CaaCS Audit — 2026-05
Audit metadata
- Repository:
spec-kitty(fork at/home/stijn/Documents/_code/SDD/fork/spec-kitty) - Branch:
feat/caacs-doctrine - Commit SHA at audit time:
bc64dec6ee37dbbd6bc21a0a1aa3195f2bab1b57 - Audit date: 2026-05-08
- Window: 1 year (
--since="1 year ago"); velocity uses 2 years - Scope:
src/only (src/specify_cli/,src/doctrine/,src/charter/,src/kernel/, plus three empty leftover dirssrc/runtime/,src/dashboard/,src/constitution/that contain only__pycache__/and are flagged as cleanup candidates). - Exclusion list applied (vanity / generated / non-code):
**/__pycache__/**,**/.mypy_cache/**,**/CHANGELOG*.md,**/*.lock,uv.lock,poetry.lock. Generated agent directories (.claude/,.amazonq/, etc.) live outsidesrc/and are naturally excluded by the scope filter. No**/*.jsonexclusion was applied because there are very few JSON files insrc/and they are intentional (schemas, manifests,default-toolguides.json); excluding them would have hidden two legitimately churning schema files. - Tooling:
git2.x, Python 3.13.12,radon6.0.1.clocis not installed on this machine, so SLOC counts usewc -linstead (acceptable for the Python-only scope; documented under "Limitations"). - Producer: researcher subagent ("Reza"), instructed by the
forensic-repository-audittactic andlegacy-codebase-triageprocedure on this branch. Not architect-reviewed.
Methodology
The five core CaaCS recipes from the
forensic-repository-audit
tactic were executed verbatim against src/, with the exclusion list above
applied. The procedure
legacy-codebase-triage
specifies the additional steps (temporal coupling, complexity overlay, DDD-tentative
classification, Eisenhower triage). The exact commands run are:
# 1. Churn (top 50)
git log --format=format: --name-only --since="1 year ago" -- src/ \
':!**/CHANGELOG*.md' ':!**/__pycache__/**' ':!**/*.lock' ':!**/uv.lock' \
':!**/poetry.lock' ':!**/.mypy_cache/**' \
| grep -v '^$' | sort | uniq -c | sort -nr | head -50
# 2. Bus factor (overall + per top-15 hotspot)
git log --no-merges --since="1 year ago" --format='%an' -- src/ \
| sort | uniq -c | sort -nr
git log --since="1 year ago" --no-merges --format='%an' --follow -- <file> \
| sort | uniq -c | sort -nr # per file
# 3. Bug hotspots
git log -i -E --grep="fix|bug|broken|regress|hotfix" --since="1 year ago" \
--name-only --format='' --no-merges -- src/ <exclusions> \
| grep -v '^$' | sort | uniq -c | sort -nr
# 4. Velocity (2y, monthly)
git log --since="2 years ago" --format='%ad' --date=format:'%Y-%m' --no-merges -- src/ \
| sort | uniq -c
# 5. Firefighting
git log --oneline --since="1 year ago" --no-merges -- src/ \
| grep -iE 'revert|hotfix|emergency|rollback'
# 6. Temporal coupling: Python script (/tmp/caacs/coupling.py) walks
# `git log --no-merges --since="1y" --name-only --format=__COMMIT__%H -- src/`,
# emits all unordered file-pairs per multi-file commit, counts.
# 7. Complexity (Python only)
radon cc -a -s --total-average src/specify_cli src/doctrine src/charter src/kernel
radon mi -s src/specify_cli src/doctrine src/charter src/kernel | grep -E " - [BC] "
Inherited biases (per the tactic's failure_modes)
| Bias | Effect on this audit |
|---|---|
| Squash-merge distortion | Repo uses both squash and non-squash merges (mixed history). Velocity counts upstream activity reasonably faithfully; PR-level metadata not consulted. |
| Weak commit messages | Conventional Commits dominate (fix:, feat:, refactor:, docs:). The fix-grep matches body too — a few false positives (e.g. spec commits that contain "fix" in prose). Spot-checked, false-positive rate is low. |
| Vanity-file dominance | Lockfiles, __pycache__, CHANGELOG excluded. Spot-checked top 4 hotspots for insertion/deletion ratio — all show substantive code change, not formatting noise. |
| No rename-following by default | The bulk recipes do not follow renames. Per-file authorship for top-15 hotspots used --follow. Three known renames in this window (acceptance.py → acceptance/__init__.py, dashboard.py → cli/commands/dashboard.py, cli/commands/agent/feature.py deleted in 7428880c4) split history; documented in the hotspot table. |
| No complexity capture in raw git data | Mitigated via radon overlay (Python-only). |
| Bus factor is a question, not a verdict | This audit treats single-author concentration as an open question. |
Scope expansion (2026-05-09)
This audit was originally scoped to src/ only. On 2026-05-09 the scope was
broadened to src/ + tests/ + kitty-specs/ to address the two largest blind
spots flagged in the original "Limitations" section: (1) absence of a test-
coverage overlay on the F-rated functions, and (2) inability to see
kitty-specs/<mission>/ ↔ src/ temporal coupling.
What changed
- Scope:
src/→src/ + tests/ + kitty-specs/ - Additional vanity exclusions required by the expanded corpus:
**/status.events.jsonl,**/status.json,kitty-specs/**/tasks.md(mission-state churn, not authoring),kitty-specs/**/snapshot-latest.json,kitty-specs/**/dossiers/**(dossier state files). The original exclusions (lockfiles,__pycache__,CHANGELOG.md,.mypy_cache) are retained. - Window unchanged: still 1 year for churn / hotspots, 2 years for velocity.
- Commit at re-run:
81883352240c3f8e0249b78875f7fa140700418f(HEAD onfeat/caacs-doctrineadvanced since the original run; original tables below reflect SHAbc64dec6, the broader-scope tables in this section reflect the newer SHA. The 1y window means the two windows overlap by ~364 days; observed shape is consistent.).
Headline impact
- F1 (bus factor) — worse. Single-author concentration rises from 89.5% → 95.2% when tests and missions are folded in. Robert Douglass authored 2613 of 2744 commits across the full corpus. F1 is reaffirmed and intensified.
- F2 (hotspot list) — unchanged. The top-30 src/ files are identical;
the extra rows that surface in the full-corpus view are tests and mission
artifacts that belong to those same hotspots (e.g.
tests/sync/test_events.pynext tosync/emitter.py,kitty-specs/041-…/tasks/WP*.mdnext toglossary/*). The hotspot diagnosis is stable under scope expansion. - New findings (F15–F18): test-update lag on agent/* hotspots,
glossary middleware as an under-tested hotspot, mission/code temporal
coupling is commit-level decoupled by design (separate phases), and the
glossary subsystem cluster (mission 041 +
src/specify_cli/glossary/) is the second densest coupling cluster after agent/.
Executive summary (one paragraph)
Broadening the scope confirms — and somewhat sharpens — every finding from the
src/-only run. The 95.2% bus factor is the headline change; tests and mission
artifacts do not dilute the lone-author signal because the same author wrote
them too. The hotspot list does not move, which is reassuring (the recipes
are robust). The new signal worth acting on is the test-update lag on the
F2 cluster: agent/tasks.py and agent/workflow.py change with their tests
only ~30% of the time — i.e. ~70% of changes ship without test updates.
Combined with their F-rated complexity that's a non-trivial regression risk.
Mission ↔ code temporal coupling at the commit level is sparse (max 3
co-changes per pair) because the SDD pipeline deliberately separates planning
commits from implementation commits — that's not a defect, it's the design,
but it means the mission-driven causal coupling is invisible to single-commit
recipes and would need PR-grouping or trailer-based attribution to see clearly.
Top findings (executive summary)
- Project is very alive but contributor-monopolised. 1001 commits to
src/in the last year, accelerating in 2026 (288 commits in Feb alone). One author (Robert Douglass) authored 89.5% of allsrc/commits in the window (896 / 1001). Of the top 15 hotspots, 14 are >90% single-author. Bus factor is effectively 1 across the whole codebase. This is the dominant risk surfaced by the audit. src/specify_cli/cli/commands/agent/is a refactor target. Three files (tasks.py— 3746 SLOC;workflow.py— 1895 SLOC;mission.py— 2314 SLOC) together hold five of the seven worst-complexity functions in the repo (finalize_tasksCC=160,move_taskCC=139,statusCC=87,reviewCC=84,map_requirementsCC=74). They are also the top-three temporal-coupling pair (45 co-changes betweentasks.pyandworkflow.pyalone) and dominate the bug-hotspot table. Both unstable and known-defective = strongest refactor candidate the recipes can produce.- Firefighting frequency is low (~0.8% of commits). Eight matches across a year (2 reverts, 1 explicit hotfix, 5 "rollback" mentions that are feature-implementation references rather than emergency fixes). The team appears to trust the pipeline — this is a healthy signal that contradicts the "monopolised" risk above. The risk lives in knowledge transfer, not in merge discipline.
- Mission-template files churn alongside the CLI commands that consume them.
software-dev/command-templates/{specify,plan,tasks,implement,review,tasks-packages}.mdappear in the top-30 hotspots and co-change with the agent commands (e.g.tasks.md↔tasks.py: 18 co-changes,specify.md↔agent/tasks.py: 12). This is expected coupling by design (templates are the contract between runtime and mission), not a defect — but it is also evidence that the "template + dispatcher" boundary is the load-bearing seam in the system. - Three empty top-level src/ directories are leftovers from the
shared-package-boundary cutover (
src/runtime/,src/dashboard/,src/constitution/contain only stale__pycache__/). PerCLAUDE.mdthese were either moved intosrc/specify_cli/or extracted to PyPI dependencies. Recommend deletion.
Hotspot table (top 30, vanity-filtered)
Hotspots — src/ only (original)
Churn = commits touching the file in the 1y window. Bug commits = subset whose
commit message matches fix|bug|broken|regress|hotfix (case-insensitive). Bus factor is the percentage of those churn commits authored by the dominant author
(in every case below: Robert Douglass). CC max is the highest-rated cyclomatic
function in the file at HEAD (radon rank in parens). DDD column is
architect-ratified.
| # | File | Churn | Bug commits | SLOC | CC max | Top-author share | DDD (architect-ratified) |
|---|---|---|---|---|---|---|---|
| 1 | src/specify_cli/__init__.py |
119 | 25 | 224 | A | 49% (Robert; co-owned w/ Den, honjo, Bruno) | glue (CLI bootstrap) |
| 2 | src/specify_cli/cli/commands/agent/tasks.py |
87 | 74 | 3746 | F (160 finalize_tasks) |
98% | core (mission orchestration) |
| 3 | src/specify_cli/cli/commands/agent/workflow.py |
77 | 67 | 1895 | F (84 review) |
96% | core (lane/workflow dispatch) |
| 4 | src/specify_cli/cli/commands/implement.py |
67 | 55 | 718 | F (44) | 97% | core (workspace resolution) |
| 5 | src/specify_cli/cli/commands/merge.py |
56 | 42 | 1599 | F (63 _run_lane_based_merge_locked) |
96% | core (merge state machine) |
| 6 | src/specify_cli/cli/commands/agent/feature.py |
56 | 43 | DELETED | n/a | 100% | glue (deleted in 7428880c4, mission-id cutover) |
| 7 | src/specify_cli/cli/commands/init.py |
50 | 28 | 1018 | F (94 init) |
96% | supporting (project bootstrap) |
| 8 | src/specify_cli/cli/commands/__init__.py |
48 | 23 | 115 | A | 96% | glue (wiring) |
| 9 | src/specify_cli/sync/emitter.py |
42 | 30 | 1682 | C (avg) MI=C | 93% | supporting (SaaS sync) |
| 10 | src/specify_cli/missions/software-dev/command-templates/specify.md |
38 | 28 | n/a (markdown) | n/a | n/a | core (SDD methodology contract) |
| 11 | src/specify_cli/cli/commands/sync.py |
36 | 23 | 1462 | MI=C | 97% | supporting (sync CLI) |
| 12 | src/specify_cli/missions/software-dev/command-templates/tasks.md |
33 | 27 | n/a | n/a | n/a | core (SDD methodology contract) |
| 13 | src/specify_cli/cli/commands/dashboard.py |
33 | 24 | 142 (renamed from dashboard.py) |
n/a | 100% | glue (CLI shim) |
| 14 | src/specify_cli/glossary/middleware.py |
36 | 19 | 689 | n/a | 100% | supporting (glossary) |
| 15 | src/specify_cli/dashboard/static/dashboard/dashboard.js |
29 | 21 | n/a | n/a | n/a | supporting (UI) |
| 16 | src/specify_cli/dashboard/scanner.py |
28 | 24 | 785 | n/a | 93% | supporting (dashboard scanner) |
| 17 | src/specify_cli/missions/software-dev/command-templates/plan.md |
27 | 19 | n/a | n/a | n/a | core (SDD methodology contract) |
| 18 | src/specify_cli/missions/software-dev/command-templates/implement.md |
27 | 21 | n/a | n/a | n/a | core (SDD methodology contract) |
| 19 | src/specify_cli/upgrade/migrations/__init__.py |
26 | 20 | 89 | n/a | 100% | glue (migration registry) |
| 20 | src/specify_cli/sync/events.py |
26 | 19 | 499 | n/a | 96% | supporting (sync envelopes) |
| 21 | src/specify_cli/next/runtime_bridge.py |
26 | 25 | 2552 | F (46) MI=C | 96% | core (mission-next runtime bridge) |
| 22 | src/specify_cli/tasks_support.py |
25 | 17 | 31 | A | 100% | glue (re-export shim) |
| 23 | src/specify_cli/status/emit.py |
25 | 22 | 656 | E (40 batch) | 100% | core (status state machine) |
| 24 | src/specify_cli/core/worktree.py |
25 | 20 | 681 | n/a | 96% | core (git worktree mgmt) |
| 25 | src/specify_cli/glossary/__init__.py |
24 | 13 | n/a | n/a | 100% | supporting (glossary entrypoint) |
| 26 | src/specify_cli/cli/commands/charter.py |
23 | 18 | 2934 | E (38 interview) MI=C |
100% | supporting (charter CLI) |
| 27 | src/specify_cli/acceptance.py (now acceptance/__init__.py) |
22 | 17 | 793 | MI=B | 100% | supporting (acceptance workflow) |
| 28 | src/specify_cli/orchestrator_api/commands.py |
21 | 17 | 1097 | n/a | 100% | core (external orchestration API) |
| 29 | src/specify_cli/agent_utils/status.py |
21 | 15 | 570 | F (53 _display_status_board) |
100% | supporting (kanban renderer) |
| 30 | src/specify_cli/cli/commands/agent/status.py |
20 | 14 | 886 | n/a | 100% | supporting (status CLI) |
Cross-reference (procedure exit condition): files appearing in both top-10
churn and top-10 bug-hotspot lists are the strongest refactor candidates per the
tactic. That set is:
agent/tasks.py, agent/workflow.py, implement.py, merge.py, agent/feature.py
(deleted), init.py, commands/__init__.py, sync/emitter.py, cli/commands/sync.py.
DDD classifications above were ratified by Architect Alphonso (Stijn Dejongh) on 2026-05-08; the column may now be treated as authoritative for this audit run. Revisions and rationales are listed in the "DDD ratification notes (architect)" subsection below.
DDD ratification notes (architect)
src/specify_cli/missions/software-dev/command-templates/specify.md: tentative=supporting→ ratified=core— Mission templates encode the SDD methodology itself; they are the user-facing contract that differentiates spec-kitty from generic CLI scaffolders.src/specify_cli/missions/software-dev/command-templates/tasks.md: tentative=supporting→ ratified=core— Same rationale asspecify.md; the WP-decomposition contract is part of spec-kitty's differentiating methodology.src/specify_cli/missions/software-dev/command-templates/plan.md: tentative=supporting→ ratified=core— Same rationale; the plan template is a load-bearing piece of the SDD pipeline contract, not a swappable doc fragment.src/specify_cli/missions/software-dev/command-templates/implement.md: tentative=supporting→ ratified=core— Same rationale; defines the lane-aware execution handshake that no off-the-shelf tool replicates.src/specify_cli/acceptance/__init__.py: tentative=core→ ratified=supporting— Acceptance gating is a common workflow-tool pattern; the differentiating state-machine logic lives instatus/emit.pyand the merge pipeline, not here.
Hotspots — full corpus (src/ + tests/ + kitty-specs/)
Same 1y window. Same exclusions as src/-only plus: **/status.events.jsonl,
**/status.json, kitty-specs/**/tasks.md, kitty-specs/**/snapshot-latest.json,
kitty-specs/**/dossiers/**. Origin = which top-level directory the file
lives in.
| # | File | Churn | Origin | Note |
|---|---|---|---|---|
| 1 | src/specify_cli/__init__.py |
119 | src | unchanged from src/-only |
| 2 | src/specify_cli/cli/commands/agent/tasks.py |
87 | src | F2 hotspot |
| 3 | src/specify_cli/cli/commands/agent/workflow.py |
77 | src | F2 hotspot |
| 4 | src/specify_cli/cli/commands/implement.py |
67 | src | F2 hotspot |
| 5 | src/specify_cli/cli/commands/merge.py |
56 | src | F2 hotspot |
| 6 | src/specify_cli/cli/commands/agent/feature.py |
55 | src | deleted, pre-cutover hotspot |
| 7 | src/specify_cli/cli/commands/init.py |
50 | src | |
| 8 | src/specify_cli/cli/commands/__init__.py |
47 | src | wiring |
| 9 | src/specify_cli/sync/emitter.py |
42 | src | |
| 10 | src/specify_cli/missions/software-dev/command-templates/specify.md |
38 | src | |
| 11 | src/specify_cli/glossary/middleware.py |
36 | src | |
| 12 | src/specify_cli/cli/commands/sync.py |
36 | src | |
| 13 | src/specify_cli/missions/software-dev/command-templates/tasks.md |
33 | src | |
| 14 | src/specify_cli/dashboard.py |
33 | src | renamed → cli/commands/dashboard.py |
| 15 | src/specify_cli/dashboard/static/dashboard/dashboard.js |
29 | src | UI |
| 16 | src/specify_cli/sync/events.py |
28 | src | |
| 17 | src/specify_cli/dashboard/scanner.py |
28 | src | |
| 18 | src/specify_cli/missions/software-dev/command-templates/plan.md |
27 | src | |
| 19 | src/specify_cli/missions/software-dev/command-templates/implement.md |
27 | src | |
| 20 | kitty-specs/041-mission-glossary-semantic-integrity/tasks/WP03-term-extraction-implementation.md |
27 | kitty-specs | first non-src/ entry; glossary mission |
| 21 | src/specify_cli/upgrade/migrations/__init__.py |
26 | src | |
| 22 | src/specify_cli/next/runtime_bridge.py |
26 | src | |
| 23 | src/specify_cli/glossary/__init__.py |
26 | src | |
| 24 | src/specify_cli/tasks_support.py |
25 | src | |
| 25 | src/specify_cli/status/emit.py |
25 | src | |
| 26 | src/specify_cli/core/worktree.py |
25 | src | |
| 27 | tests/sync/test_events.py |
24 | tests | first tests/ entry; sync hotspot's test |
| 28 | kitty-specs/041-mission-glossary-semantic-integrity/tasks/WP11-type-safety-and-integration-tests.md |
24 | kitty-specs | glossary mission again |
| 29 | src/specify_cli/cli/commands/charter.py |
23 | src | |
| 30 | src/specify_cli/acceptance.py |
22 | src |
Interpretation: the top-19 of the full-corpus table is identical to
the src/-only top-19. The first non-src/ entry is rank 20: a single mission's
WP file (mission 041, glossary semantic integrity, has 11 task files all in
the top-100). This is mission 041 doing what missions are supposed to do —
churning through revisions before being merged — not a hotspot in the
"buggy file" sense. Rank 27 is the first tests/ entry (test_events.py),
which sits right next to its source (sync/emitter.py at rank 9). Conclusion:
the F2 hotspot list is robust under scope expansion — broadening to the
full corpus does not surface new structural hotspots; it surfaces (a) a
single high-revision mission and (b) tests that already track their hot
sources. F2 stands.
Temporal coupling (top 30 pairs, both files non-vanity)
Source: 586 multi-file src/-only commits in the 1y window. 239,639 unique pairs
emitted (Cartesian product is large; we sliced top 30).
| # | Co-changes | File A | File B | Note |
|---|---|---|---|---|
| 1 | 45 | cli/commands/agent/tasks.py |
cli/commands/agent/workflow.py |
Internal agent/ cluster |
| 2 | 34 | cli/commands/agent/tasks.py |
cli/commands/implement.py |
Mission orchestration ↔ implementation dispatch |
| 3 | 29 | cli/commands/agent/workflow.py |
cli/commands/implement.py |
Same cluster |
| 4 | 26 | cli/commands/implement.py |
cli/commands/merge.py |
Workspace-lifecycle pair |
| 5 | 22 | cli/commands/agent/tasks.py |
cli/commands/merge.py |
Same cluster |
| 6 | 22 | missions/software-dev/command-templates/plan.md |
…/specify.md |
Mission templates co-evolve |
| 7 | 21 | …/specify.md |
…/tasks.md |
Mission templates co-evolve |
| 8 | 21 | cli/commands/agent/feature.py |
cli/commands/agent/tasks.py |
Pre-cutover coupling (feature.py deleted) |
| 9 | 20 | sync/emitter.py |
sync/events.py |
Sync envelope ↔ emitter (expected) |
| 10 | 19 | cli/commands/agent/feature.py |
cli/commands/agent/workflow.py |
Pre-cutover |
| 11 | 18 | cli/commands/agent/tasks.py |
missions/software-dev/command-templates/tasks.md |
Template ↔ dispatcher seam |
| 12 | 18 | cli/commands/agent/feature.py |
cli/commands/implement.py |
Pre-cutover |
| 13 | 17 | cli/commands/agent/workflow.py |
cli/commands/merge.py |
Same cluster |
| 14 | 17 | cli/commands/agent/workflow.py |
…/tasks.md |
Template ↔ dispatcher seam |
| 15 | 16 | …/plan.md |
…/tasks.md |
Mission templates co-evolve |
| 16 | 15 | …/implement.md |
…/review.md |
Mission templates co-evolve |
| 17 | 15 | cli/commands/accept.py |
cli/commands/implement.py |
Lifecycle pair |
| 18 | 15 | cli/commands/agent/workflow.py |
…/specify.md |
Template ↔ dispatcher seam |
| 19 | 15 | cli/commands/implement.py |
…/tasks.md |
Template ↔ dispatcher seam |
| 20 | 15 | missions/software-dev/templates/task-prompt-template.md |
templates/task-prompt-template.md |
Duplicated template (two locations co-edited) |
| 21 | 15 | cli/commands/agent/feature.py |
…/tasks.md |
Pre-cutover |
| 22 | 14 | agent_utils/status.py |
cli/commands/agent/tasks.py |
Status renderer ↔ status owner |
| 23 | 14 | cli/commands/accept.py |
cli/commands/merge.py |
Lifecycle pair |
| 24 | 14 | cli/commands/agent/tasks.py |
core/worktree.py |
Mission orchestration ↔ git worktree |
| 25 | 14 | cli/commands/agent/workflow.py |
core/worktree.py |
Same |
| 26 | 14 | cli/commands/agent/feature.py |
cli/commands/merge.py |
Pre-cutover |
| 27 | 13 | dashboard/scanner.py |
dashboard/static/dashboard/dashboard.js |
UI ↔ scanner (expected) |
| 28 | 13 | …/tasks-packages.md |
…/tasks.md |
Templates |
| 29 | 13 | …/tasks-outline.md |
…/tasks-packages.md |
Templates |
| 30 | 13 | cli/commands/merge.py |
core/worktree.py |
Merge ↔ worktree (expected) |
Cluster signal: the agent/{tasks,workflow,feature}.py ↔ implement.py ↔
merge.py ↔ core/worktree.py graph contains the densest coupling. Of the top-30
pairs, 22 involve at least one of those six files. This is the system's
load-bearing transaction (mission state ↔ git worktree).
Anomaly worth a question: pair #20 — missions/software-dev/templates/task-prompt-template.md
co-edited 15 times with templates/task-prompt-template.md. Why does the same
template exist in two locations and require synchronised edits? This is a
candidate for connascence-of-meaning analysis (likely a stale duplicate after a
template-resolver chain change).
Cross-cutting temporal coupling (mission ↔ src/ ↔ tests/) (2026-05-09)
The src/-only run could not see two important coupling kinds: (a) mission spec ↔ source-file co-changes, and (b) source-file ↔ matching-test co-changes. Both are computed here on the full corpus.
Mission-directory ↔ src-file co-changes (top 20)
For each commit, all files in kitty-specs/<mission>/ are collapsed to a
single token (the mission directory) and paired with each touched src/ file.
Counts are deliberately low because the SDD pipeline lands planning commits
and implementation commits separately by design — a co-change here means
both the spec and the code were touched in the same commit (e.g. a fix that
also amended the spec, or an implementation commit that adjusted spec
metadata).
| # | Co-changes | Mission directory | Src file/dir |
|---|---|---|---|
| 1 | 5 | 012-documentation-mission/ |
src/specify_cli/cli/commands/ (any file) |
| 2 | 5 | 010-workspace-per-work-package-for-parallel-development/ |
src/specify_cli/cli/commands/ |
| 3 | 4 | 008-unified-python-cli/ |
src/specify_cli/cli/commands/ |
| 4 | 3 | auth-local-trust-and-multi-process-hardening-01KQW587/ |
src/specify_cli/auth/token_manager.py |
| 5 | 3 | auth-local-trust-and-multi-process-hardening-01KQW587/ |
src/specify_cli/auth/session_hot_path.py |
| 6 | 3 | 064-complete-mission-identity-cutover/ |
src/specify_cli/missions/software-dev/ |
| 7 | 3 | 067-runtime-recovery-and-audit-safety/ |
src/specify_cli/missions/software-dev/ |
| 8 | 3 | 065-tasks-and-lane-stabilization/ |
src/specify_cli/missions/software-dev/ |
| 9 | 3 | unified-charter-bundle-chokepoint-01KP5Q2G/ |
src/specify_cli/upgrade/migrations/ |
| 10 | 3 | phase-3-charter-synthesizer-pipeline-01KPE222/ |
src/charter/synthesizer/synthesize_pipeline.py |
| 11 | 3 | phase-3-charter-synthesizer-pipeline-01KPE222/ |
src/charter/synthesizer/errors.py |
| 12 | 3 | phase-3-charter-synthesizer-pipeline-01KPE222/ |
src/charter/synthesizer/write_pipeline.py |
| 13 | 3 | 005-refactor-mission-system/ |
src/specify_cli/cli/commands/ |
| 14 | 3 | 008-unified-python-cli/ |
src/specify_cli/upgrade/migrations/ |
| 15 | 3 | 004-modular-code-refactoring/ |
src/specify_cli/cli/commands/ |
| 16 | 3 | 025-cli-event-log-integration/ |
src/specify_cli/cli/commands/ |
| 17 | 3 | 010-workspace-per-work-package-for-parallel-development/ |
src/specify_cli/upgrade/migrations/ |
| 18 | 3 | 011-constitution-packaging-safety-and-redesign/ |
src/specify_cli/upgrade/migrations/ |
| 19 | 3 | 015-first-class-jujutsu-vcs-integration/ |
src/specify_cli/cli/commands/ |
| 20 | 3 | 010-workspace-per-work-package-for-parallel-development/ |
src/specify_cli/core/worktree.py |
Interpretation: the maximum is 5 co-changes/year for any
mission ↔ src pair. That is very low compared with the in-src/ pairs
(top pair = 45 co-changes/y). This means mission-driven causal coupling
is invisible to commit-level recipes — the SDD pipeline works as designed,
landing spec edits and code edits in separate commits. To see the real
mission ↔ code coupling we'd need PR-level grouping, branch-level
attribution, or trailer-based association (e.g. Mission: 041). Treat this
table as a sanity check (no surprising couplings) rather than a refactor
signal.
The clusters that do surface are revealing as secondary structure: the
charter synthesizer phase-3 mission cluster (rows 10–12) shows three
synthesizer modules co-changing with their mission spec — that's the
expected pattern when a mission is genuinely active. The auth hardening
mission (rows 4–5) shows the same pattern for auth/. Mission 010
(workspace-per-WP) ↔ core/worktree.py (row 20) is the strongest specific
file coupling and is also expected by design.
src-file ↔ test-file co-changes (top 20)
Per-pair count of commits touching both a src/ file and a tests/ file:
| # | Co-changes | Src file | Test file |
|---|---|---|---|
| 1 | 13 | sync/emitter.py |
tests/sync/test_events.py |
| 2 | 12 | cli/commands/agent/feature.py |
tests/specify_cli/test_cli/test_agent_feature.py |
| 3 | 12 | glossary/extraction.py |
tests/specify_cli/glossary/test_extraction.py |
| 4 | 11 | dashboard/scanner.py |
tests/test_dashboard/test_scanner.py |
| 5 | 11 | glossary/middleware.py |
tests/specify_cli/glossary/test_middleware.py |
| 6 | 9 | glossary/scope.py |
tests/specify_cli/glossary/test_scope.py |
| 7 | 8 | sync/emitter.py |
tests/sync/test_event_emission.py |
| 8 | 8 | sync/events.py |
tests/sync/test_event_emission.py |
| 9 | 8 | sync/events.py |
tests/sync/test_events.py |
| 10 | 8 | glossary/models.py |
tests/specify_cli/glossary/test_models.py |
| 11 | 7 | sync/batch.py |
tests/sync/test_batch_sync.py |
| 12 | 7 | sync/emitter.py |
tests/sync/conftest.py |
| 13 | 7 | cli/commands/agent/tasks.py |
tests/integration/test_task_workflow.py |
| 14 | 7 | status/transitions.py |
tests/specify_cli/status/test_transitions.py |
| 15 | 7 | cli/commands/init.py |
tests/specify_cli/test_cli/test_init_command.py |
| 16 | 7 | cli/commands/agent/tasks.py |
tests/unit/agent/test_tasks.py |
| 17 | 7 | runtime/bootstrap.py |
tests/unit/runtime/test_bootstrap.py |
| 18 | 7 | glossary/events.py |
tests/specify_cli/glossary/test_checkpoint_resume.py |
| 19 | 7 | glossary/events.py |
tests/specify_cli/glossary/test_event_emission.py |
| 20 | 6 | dashboard/static/dashboard/dashboard.js |
tests/test_dashboard/test_static.py |
Interpretation: the sync/ and glossary/ subsystems show the healthiest
src ↔ test co-change discipline (rows 1, 3, 4, 5, 6, 8–11, 18, 19 — ten of
the top twenty pairs are in those two subsystems). The agent/* cluster is
conspicuously absent from the top of this table — agent/tasks.py's
strongest test pair is rank 13 with 7 co-changes, despite the file having 87
churn commits. That's a 8% co-change rate at the strongest-pair level.
F2 hotspots: change-with-tests rate
For the F2 cluster specifically — how often does a commit that touches the hotspot also touch a matching test?
| Hotspot | Commits touching file | Also touches matching test | Rate |
|---|---|---|---|
cli/commands/agent/tasks.py |
87 | 26 | 29.9% |
cli/commands/agent/workflow.py |
77 | 21 | 27.3% |
cli/commands/agent/mission.py |
19 | 6 | 31.6% |
~70% of changes to the F2 cluster ship without a matching test update.
This is new evidence the src/-only run could not produce. Combined with the
F-rated cyclomatic complexity in those files (CC=160 finalize_tasks,
CC=139 move_task, CC=87 status, CC=84 review), this is the strongest
empirical case for a refactor-with-test-build-out on the agent/* cluster.
F2's priority is upgraded in the revised triage matrix below.
Test coverage proxy (2026-05-09)
For each top-20 hotspot, the matching test file(s) were located by name and
churn-compared. Ratio = test churn / src churn × 100. Heuristic flags:
UNTESTED (no matching test found), UNDER-TESTED (ratio < 10%),
test-heavy (ratio > 100%, healthy or potentially churny tests).
| Source | src churn | test churn | Ratio | Flag |
|---|---|---|---|---|
cli/commands/agent/tasks.py |
87 | 19 | 21.8% | under-tested |
cli/commands/agent/workflow.py |
77 | 30 | 39.0% | under-tested |
cli/commands/implement.py |
67 | 33 | 49.3% | under-tested-ish |
cli/commands/merge.py |
56 | 44 | 78.6% | borderline |
cli/commands/init.py |
50 | 51 | 102.0% | test-heavy (good) |
sync/emitter.py |
42 | 110 | 261.9% | test-heavy (good) |
glossary/middleware.py |
36 | 5 | 13.9% | under-tested |
cli/commands/sync.py |
36 | 110 | 305.6% | test-heavy (good) |
dashboard/scanner.py |
28 | 19 | 67.9% | borderline |
next/runtime_bridge.py |
26 | 50 | 192.3% | test-heavy (good) |
upgrade/migrations/__init__.py |
26 | 34 | 130.8% | test-heavy (good) |
glossary/__init__.py |
24 | 58 | 241.7% | test-heavy (good) |
status/emit.py |
25 | 31 | 124.0% | test-heavy (good) |
core/worktree.py |
25 | 28 | 112.0% | test-heavy (good) |
cli/commands/charter.py |
23 | 34 | 147.8% | test-heavy (good) |
acceptance.py |
22 | 24 | 109.1% | test-heavy (good) |
orchestrator_api/commands.py |
21 | 46 | 219.0% | test-heavy (good) |
agent_utils/status.py |
21 | 4 | 19.0% | under-tested |
cli/commands/agent/status.py |
20 | 18 | 90.0% | borderline |
cli/commands/accept.py |
20 | 21 | 105.0% | test-heavy (good) |
Three files flagged as under-tested by the ratio rule:
cli/commands/agent/tasks.py(21.8%)cli/commands/agent/workflow.py(39.0%)glossary/middleware.py(13.9%)agent_utils/status.py(19.0%)
The first two reinforce F2 — the agent/* cluster needs both a structural
refactor and a test-build-out. The third (glossary/middleware.py) is a
new finding (F16): the glossary subsystem looked healthy in the
src ↔ tests pair table (multiple healthy pairs in glossary/*) but
middleware specifically — the dispatch surface — is under-tested relative to
its 36 churn commits. The fourth (agent_utils/status.py) is the kanban
renderer with F-53 _display_status_board; under-tested at 19% ratio.
No hotspot from the top-20 is genuinely UNTESTED (initial heuristic
produced four false-negatives — acceptance.py, orchestrator_api/commands.py,
agent/status.py, accept.py — that have tests under non-conventional
names; corrected counts are shown above).
Caveat: ratio is a churn proxy, not coverage. A file with low test churn might be tested via integration tests that evolve more slowly than unit tests. The methodology cannot distinguish "tests don't change because the source's behavior is stable on the contract surface" from "tests are neglected." Treat the four under-tested flags as prompts to verify, not verdicts.
Bus factor / knowledge map
Overall (1y, src/-only commits):
| Author | Commits | Share |
|---|---|---|
| Robert Douglass | 896 | 89.5% |
| Stijn Dejongh | 33 | 3.3% |
| Den Delimarsky 🌺 | 32 | 3.2% |
| honjo-hiroaki-gtt | 5 | 0.5% |
| den (work) | 5 | 0.5% |
| Jerome LACUBE | 3 | 0.3% |
| Jerome Lacube | 3 | 0.3% |
| Bruno Borges | 3 | 0.3% |
| Zhiqiang ZHOU | 2 | 0.2% |
| Tanner | 2 | 0.2% |
| Ram | 2 | 0.2% |
| Brian Anderson | 2 | 0.2% |
| 13 others | 1 each | <0.2% each |
Per-hotspot single-author share (top 15, with --follow):
| File | Top author share |
|---|---|
__init__.py |
49% (Robert) — only file with significant co-authorship; reflects 1.x-era contributors (Den, Bruno, honjo) |
cli/commands/agent/tasks.py |
97.7% (Robert; 2 commits Stijn) |
cli/commands/agent/workflow.py |
96.1% (Robert; 2 Stijn, 1 Jerome) |
cli/commands/implement.py |
97.0% (Robert) |
cli/commands/merge.py |
96.4% (Robert) |
cli/commands/agent/feature.py (deleted) |
100% (Robert) |
cli/commands/init.py |
96.0% (Robert) |
cli/commands/__init__.py |
95.8% (Robert) |
sync/emitter.py |
92.9% (Robert; 3 Stijn) |
cli/commands/sync.py |
97.2% (Robert) |
cli/commands/dashboard.py (renamed) |
100% (Robert) |
glossary/middleware.py |
100% (Robert) |
dashboard/scanner.py |
92.9% (Robert) |
upgrade/migrations/__init__.py |
100% (Robert) |
sync/events.py |
96.4% (Robert) |
Open question (per the tactic): is this single-author concentration a "stable mature ownership" signal or a "knowledge bus factor" signal? The recipes cannot answer that — only conversation with the team can. But the combination of (a) 89.5% concentration, (b) high churn (still active), and (c) high bug-fix density on those same files leans toward bus factor over maturity.
Bus factor — full corpus reassessment (2026-05-09)
Re-running on src/ + tests/ + kitty-specs/ with the expanded exclusion list:
| Author | Commits | Share |
|---|---|---|
| Robert Douglass | 2613 | 95.2% |
| Stijn Dejongh | 58 | 2.1% |
| Den Delimarsky 🌺 | 32 | 1.2% |
| honjo-hiroaki-gtt | 5 | 0.2% |
| den (work) | 5 | 0.2% |
| Tanner | 3 | 0.1% |
| Jerome LACUBE | 3 | 0.1% |
| Jerome Lacube | 3 | 0.1% |
| Bruno Borges | 3 | 0.1% |
| 16 others | ≤2 each | <0.1% |
Total commits in window: 2744 (vs. 1001 in src/-only). The other ~1740
commits are tests/ and kitty-specs/ authoring — and 97.8% of those
non-src/ commits are also Robert. Concentration is not diluted by tests
or specs; it is intensified.
Per-hotspot (file + matching tests, full-corpus):
| Hotspot | Top-author share (file + tests) |
|---|---|
agent/tasks.py + matching tests |
97.7% (Robert; 2 Stijn) |
agent/workflow.py + matching tests |
96.1% (Robert; 2 Stijn, 1 Jerome) |
implement.py + matching tests |
97.0% (Robert; 2 Stijn) |
merge.py + matching tests |
96.4% (Robert; 2 Stijn) |
sync/emitter.py + matching tests |
93.4% (Robert; 8 Stijn) |
Verdict: broadening the lens to "the file plus its tests plus its spec
history" does not improve the bus factor for any top-15 hotspot. The
slight uptick in Stijn's share on sync/emitter.py (3 → 8) is the only
visible co-ownership pocket; everywhere else the same author wrote the
implementation, the tests, and the mission spec. F1 reaffirmed: bus factor
≈ 1 across the entire codebase.
The original audit speculated that single-author concentration might be "stable mature ownership". The full-corpus view rules that out: a mature codebase with healthy ownership would show (a) test files maintained by reviewers, (b) mission specs co-authored with planners, or (c) at least multiple committers landing fixes on the F-rated functions. None of these patterns appear. F1 is now classified bus factor, not maturity.
Firefighting signal
8 commits in the 1y window match revert|hotfix|emergency|rollback (out of 1001
on src/, ~0.8%). Spot-read of all 8:
cf997620c fix(review): enforce rollback feedback capture across 2.x flows
cd128b881 Revert "fix: Restore ClarificationMiddleware (WP06) after parallel branch merge"
2e52be19b fix: backport v0.15.2 hotfix to 2.x — branch detection, subprocess encoding, hook safety
4a9509ed6 feat(WP06): define software-dev v1 mission YAML with guards and rollback
bce568943 feat(WP10): add rollback-aware merge resolution and JSONL merge
87f3efb1f feat(WP03): add deterministic reducer with rollback-aware conflict resolution
192ca305d refactor: Remove rollback-task command
d864c9829 Revert "feat(merge): run from primary repo when invoked in worktree"
Of these 8, only 3 are genuine emergency events:
cd128b881— explicit revert of a parallel-branch-merge fixd864c9829— explicit revert of a merge-from-primary-repo feature2e52be19b— explicit hotfix backport fromv0.15.2to2.x
The other 5 are features/refactors that contain rollback in their scope (rollback-aware merge, rollback feedback capture, etc.), not rollback events.
Verdict: firefighting frequency is genuinely low (~0.3% of commits if we strip the false positives). The team appears to trust the merge pipeline. This is a healthy signal that contradicts the high single-author concentration risk — the lone author has, so far, kept the pipeline reliable. The risk is bus factor, not pipeline trust.
Velocity trend (24 months, monthly commit count on src/)
2025-08: 2 ▏
2025-09: 54 ████▎
2025-10: 47 ███▊
2025-11: 70 █████▌
2025-12: 45 ███▌
2026-01: 185 ██████████████▌
2026-02: 288 ██████████████████████▊
2026-03: 71 █████▌
2026-04: 210 ████████████████▌
2026-05: 29 ██▎ (partial month, audit on day 8)
(prior 12 months in the 2y window had no commits to src/ — repository began
significant activity in late 2025 / early 2026)
Project is decisively alive and accelerating: the 2026-Q1 commit count exceeds the previous five months combined. The 2026-03 dip is plausibly the gap between the 2.x cutover (2026-02 surge) and the post-cutover reliability work (2026-04 surge). Last 30 days: 193 src/ commits. Last 90 days: 550.
Triage matrix (Eisenhower, per procedure exit condition)
The procedure asks for a four-bucket assignment of audit findings. The 2026-05-09 scope expansion introduces new findings (F15-F18) which are woven into the buckets below; the original buckets are otherwise preserved.
Important + urgent (this week)
- Schedule a knowledge-transfer pairing on
cli/commands/agent/tasks.pyandagent/workflow.py. These are the two highest-churn-and-bug files, both96% single-author, both contain F-rated cyclomatic functions. If the lone author is unavailable for two weeks, work on these files stops.
- Question the duplicated template at
missions/software-dev/templates/task-prompt-template.mdvs.templates/task-prompt-template.md. 15 co-edits in 1y suggests neither has become canonical; one of them is dead-code-by-edit-count and the audit cannot tell which from history alone. - Test-update lag on F2 cluster (NEW, 2026-05-09).
agent/tasks.pyandagent/workflow.pychange with their tests only ~30% of the time. Pair the refactor (already in this bucket) with a test-build-out pass: every commit that lands a behavior change in the F-rated functions should include a matching unit test. The lone-author dynamic means there is currently nobody enforcing this on review.
Important + not urgent (this quarter)
- Refactor
cli/commands/agent/tasks.py(3746 SLOC, F-160finalize_tasks, F-139move_task). This is the single strongest "both unstable and known-defective" candidate the audit produces. A connascence-of-meaning pass on the function boundaries is the natural follow-up. Priority upgraded by the 2026-05-09 scope expansion: 21.8% test-churn ratio and 29.9% change-with-tests rate make this both structurally and behaviourally risky. Refactor MUST include a test build-out, not just a structural decomposition. - Refactor
cli/commands/agent/mission.py(2314 SLOC, F-160finalize_tasks) — notefinalize_tasksis inmission.py, nottasks.pydespite the name; this name-vs-location mismatch is itself a coupling smell. - Decompose
cli/commands/init.py(F-94init, 1018 SLOC). Worst MI in the CLI command layer (after charter). - Decompose
cli/commands/charter.py(2934 SLOC, MI=C, three E-rated functions). Charter interview, generate, status, and synthesize are effectively four CLI verbs jammed into one file. - Re-examine
next/runtime_bridge.py(2552 SLOC). Nominally a "bridge" but is itself a hotspot (rank #21 by churn, rank #7 by SLOC). Bridge that needs 26 commits/year and contains an F-46 function isn't a bridge, it's a hub. - Investigate
glossary/middleware.pytest gap (NEW, 2026-05-09). 36 src commits / 5 test commits = 13.9% ratio, the worst hot-file ratio in the corpus. The glossary subsystem otherwise shows healthy src ↔ test co-change discipline; middleware is the outlier. Likely either (a) tested via integration tests that span multiple sources (in which case ratio is misleading), or (b) the dispatch surface where most behaviour change happens but unit-test surfaces have lagged. Verify before refactoring. - Investigate
agent_utils/status.pytest gap (NEW, 2026-05-09). Kanban renderer with F-53_display_status_board, 21 src commits / 4 test commits = 19.0% ratio. Same diagnosis as middleware; verify whether integration tests exercise it before declaring under-tested.
Not important + not urgent (don't worry)
- Firefighting frequency. Genuinely low; do not invest in pipeline-trust remediation absent other signal.
- Velocity. Healthy and growing; no investment needed.
src/kernel/andsrc/charter/. Small (694 + 11,384 SLOC), low churn (8 + 38 commits/y), no F-rated complexity outsidecharter/compiler.py(MI=B). Stable subsystems.__init__.pyat 119 commits. Shape of the data: it's the package bootstrap and absorbs every new export. Replace question with a check: is it >300 SLOC of logic? (Answer: 224 SLOC — fine.)
Parallelisable (delegate / batch)
- Delete the three empty leftover dirs:
src/runtime/,src/dashboard/(top-level — note the real dashboard issrc/specify_cli/dashboard/),src/constitution/. They contain only stale__pycache__/. Trivial, batchable. - Recipe-level cleanup of D-rated migrations
(
m_0_10_8_fix_memory_structure.py— F-47,m_3_1_1_charter_rename.py— D-27,m_3_2_0_codex_to_skills.py— D-24,m_3_2_3_unified_bundle.py— D-22). Migrations are write-once-and-never-touch; refactoring them is parallelisable and low-risk.
Cross-cutting observations
- The
agent/directory is doing too much.cli/commands/agent/tasks.py,workflow.py,mission.py,status.py, and the deletedfeature.pytogether hold ~10,000 SLOC of CLI dispatch and contain six of the eight worst-complexity functions in the project. This is "everything-the-mission- touches-funnels-here" architecture. The CaaCS recipes cannot prescribe a refactor; they can only flag that the seam is overloaded. - Template ↔ dispatcher coupling is a designed feature, but its volume is
load-bearing. Pairs #11, #14, #18, #19 in the temporal-coupling table all
show command-template
.mdfiles co-changing with their CLI dispatchers ~12-18 times per year. The mission system's design contract (templates are the source-of-truth for agent prompts) is correct, but every change to a template requires a corresponding dispatcher change. Worth examining whether a template-loader abstraction could reduce that. - The
sync/package is a coherent cluster.emitter.py,events.py,background.py,batch.py,queue.py,client.pyall appear in the churn list and co-change with each other (not with the rest of the codebase). This is a healthy modular cluster — high internal coupling, low external coupling — and does not need a refactor; it needs a second maintainer. __init__.pyco-authorship history is a leading indicator. It is the only top-15 hotspot with non-trivial co-authorship (Den, Bruno, honjo all contributed). This reflects the 1.x-era contributor base, before the 2.x cutover concentrated authorship. The drop from "many contributors on__init__.py" to "one contributor on everything else" is the bus-factor transition the project lived through.- No
tests/were in scope. This is a deliberate scope decision but it blinds the audit to the most important counterweight: test coverage on the F-rated functions. A file with CC=160 and a 95% test-line-coverage harness is a different beast than one with CC=160 and no tests. The CaaCS recipes cannot tell them apart.
Multi-window refactor-candidate synthesis (2026-05-11)
This section executes the new tactic step
(forensic-repository-audit,
"Compile a multi-window refactor-candidate list") that was added to the
tactic after the 2026-05-09 re-run. The recipe runs two passes — full
history and a velocity-adjusted recent window — over the
src/ + tests/ + kitty-specs/ scope, filters to files present at HEAD via
git ls-files, and intersects the two top-30 lists to produce a refactor
candidate list anchored in code that still exists.
Commit at run time: f1cd634d5cc106dee805140f365c0dbc6822d430 (branch
feat/caacs-doctrine). Exclusion list is the same one used in the
"Scope expansion (2026-05-09)" re-run: lockfiles, __pycache__,
CHANGELOG.md, .mypy_cache, status.events.jsonl, status.json,
kitty-specs/**/tasks.md, kitty-specs/**/snapshot-latest.json,
kitty-specs/**/dossiers/**.
Window selection rationale
The velocity series in ## Velocity trend above shows monthly src/
commits going 2 → 54 → 47 → 70 → 45 → 185 → 288 → 71 → 210 → 29
from 2025-08 through 2026-05 (partial). The most recent four full
months (2026-01 through 2026-04) sum to 754 commits, against
218 for the prior four months (2025-09 through 2025-12) — a
3.5× quarter-over-quarter acceleration. Last-30-day count is 193,
last-90-day count is 550 (per the existing audit). By the tactic's
heuristic ("accelerating velocity → 3-6 month window"), spec-kitty is
unambiguously in the accelerating regime; a four-month window is
selected — short enough to filter out the pre-2026-Q1 lull, long
enough to capture both the 2026-02 surge (288 commits) and the
2026-04 surge (210 commits) so neither single-month spike dominates.
Repository-age caveat (worth noting up front): the first commit
touching src/, tests/, or kitty-specs/ is dated 2025-08-22 —
the repository is only ~8.5 months old at audit time. The
"full-history" pass therefore covers a window that is only slightly
larger than the existing one-year step-2 window, and the
multi-year-churn signal the tactic is designed to surface
(sustained hotspots that have never settled) cannot exist in
this codebase yet. The full-history pass here behaves more like an
"all-time" pass than a true multi-year baseline. Interpretation
adjusts accordingly.
Full-history pass (top 30, files present at HEAD)
git log --format=format: --name-only -- src/ tests/ kitty-specs/
with the exclusion list above, filtered by git ls-files. SLOC
column is wc -l of the file at HEAD; for markdown templates this
counts every line including blanks (the tactic uses SLOC as a rough
size proxy, not a strict logical-line count, and cloc is not
installed in this environment).
| # | Path | Full-history count | SLOC |
|---|---|---|---|
| 1 | src/specify_cli/__init__.py |
119 | 224 |
| 2 | src/specify_cli/cli/commands/agent/tasks.py |
87 | 3746 |
| 3 | src/specify_cli/cli/commands/agent/workflow.py |
77 | 1895 |
| 4 | src/specify_cli/cli/commands/implement.py |
67 | 718 |
| 5 | src/specify_cli/cli/commands/merge.py |
56 | 1599 |
| 6 | src/specify_cli/cli/commands/init.py |
50 | 1018 |
| 7 | src/specify_cli/cli/commands/__init__.py |
47 | 115 |
| 8 | src/specify_cli/sync/emitter.py |
42 | 1682 |
| 9 | src/specify_cli/missions/software-dev/command-templates/specify.md |
38 | 635 |
| 10 | src/specify_cli/glossary/middleware.py |
36 | 689 |
| 11 | src/specify_cli/cli/commands/sync.py |
36 | 1462 |
| 12 | src/specify_cli/missions/software-dev/command-templates/tasks.md |
33 | 674 |
| 13 | src/specify_cli/dashboard/static/dashboard/dashboard.js |
29 | 1568 |
| 14 | src/specify_cli/sync/events.py |
28 | 499 |
| 15 | src/specify_cli/dashboard/scanner.py |
28 | 785 |
| 16 | src/specify_cli/missions/software-dev/command-templates/plan.md |
27 | 359 |
| 17 | src/specify_cli/missions/software-dev/command-templates/implement.md |
27 | 257 |
| 18 | kitty-specs/041-mission-glossary-semantic-integrity/tasks/WP03-term-extraction-implementation.md |
27 | 255 |
| 19 | src/specify_cli/upgrade/migrations/__init__.py |
26 | 89 |
| 20 | src/specify_cli/next/runtime_bridge.py |
26 | 2552 |
| 21 | src/specify_cli/glossary/__init__.py |
26 | 204 |
| 22 | src/specify_cli/tasks_support.py |
25 | 31 |
| 23 | src/specify_cli/status/emit.py |
25 | 656 |
| 24 | src/specify_cli/core/worktree.py |
25 | 681 |
| 25 | tests/sync/test_events.py |
24 | 1211 |
| 26 | kitty-specs/041-mission-glossary-semantic-integrity/tasks/WP11-type-safety-and-integration-tests.md |
24 | 925 |
| 27 | src/specify_cli/cli/commands/charter.py |
23 | 2934 |
| 28 | src/specify_cli/orchestrator_api/commands.py |
21 | 1097 |
| 29 | src/specify_cli/agent_utils/status.py |
21 | 570 |
| 30 | tests/conftest.py |
20 | 822 |
Inline note: tests/conftest.py (rank 30) is the only newcomer relative
to the full-corpus step-2 table; it ranks 30 here because the step-2
top-30 cut absorbed the deleted-and-renamed entries (feature.py,
dashboard.py, acceptance.py) that have been removed from HEAD by
this pass's git ls-files filter.
Velocity-adjusted pass (top 30, --since="4 months ago")
Same recipe, same exclusions, same HEAD filter, --since="4 months ago".
| # | Path | 4m count | SLOC |
|---|---|---|---|
| 1 | src/specify_cli/cli/commands/agent/tasks.py |
85 | 3746 |
| 2 | src/specify_cli/cli/commands/agent/workflow.py |
76 | 1895 |
| 3 | src/specify_cli/cli/commands/implement.py |
67 | 718 |
| 4 | src/specify_cli/cli/commands/merge.py |
54 | 1599 |
| 5 | src/specify_cli/sync/emitter.py |
42 | 1682 |
| 6 | src/specify_cli/cli/commands/init.py |
42 | 1018 |
| 7 | src/specify_cli/missions/software-dev/command-templates/specify.md |
38 | 635 |
| 8 | src/specify_cli/glossary/middleware.py |
36 | 689 |
| 9 | src/specify_cli/cli/commands/sync.py |
36 | 1462 |
| 10 | src/specify_cli/cli/commands/__init__.py |
35 | 115 |
| 11 | src/specify_cli/missions/software-dev/command-templates/tasks.md |
33 | 674 |
| 12 | src/specify_cli/sync/events.py |
28 | 499 |
| 13 | src/specify_cli/missions/software-dev/command-templates/plan.md |
27 | 359 |
| 14 | src/specify_cli/missions/software-dev/command-templates/implement.md |
27 | 257 |
| 15 | kitty-specs/041-mission-glossary-semantic-integrity/tasks/WP03-term-extraction-implementation.md |
27 | 255 |
| 16 | src/specify_cli/next/runtime_bridge.py |
26 | 2552 |
| 17 | src/specify_cli/glossary/__init__.py |
26 | 204 |
| 18 | src/specify_cli/status/emit.py |
25 | 656 |
| 19 | tests/sync/test_events.py |
24 | 1211 |
| 20 | kitty-specs/041-mission-glossary-semantic-integrity/tasks/WP11-type-safety-and-integration-tests.md |
24 | 925 |
| 21 | src/specify_cli/dashboard/scanner.py |
23 | 785 |
| 22 | src/specify_cli/core/worktree.py |
23 | 681 |
| 23 | src/specify_cli/cli/commands/charter.py |
23 | 2934 |
| 24 | src/specify_cli/orchestrator_api/commands.py |
21 | 1097 |
| 25 | src/specify_cli/dashboard/static/dashboard/dashboard.js |
21 | 1568 |
| 26 | src/specify_cli/agent_utils/status.py |
21 | 570 |
| 27 | src/specify_cli/sync/__init__.py |
20 | 190 |
| 28 | src/specify_cli/status/models.py |
20 | 422 |
| 29 | src/specify_cli/status/__init__.py |
20 | 169 |
| 30 | src/specify_cli/missions/software-dev/command-templates/review.md |
20 | 194 |
Intersection (inner-join on path)
Files in both top-30 lists. Tag rule (verbatim from the tactic): both-in-this-AND-in-step-2 → urgent; in-this-but-not-step-2 → slow-burn; in-step-2-but-not-here → unsettled burst (handled below).
| F# | R# | Path | SLOC | In step-2 (1y full-corpus) | Tag |
|---|---|---|---|---|---|
| 2 | 1 | src/specify_cli/cli/commands/agent/tasks.py |
3746 | ✓ | urgent |
| 3 | 2 | src/specify_cli/cli/commands/agent/workflow.py |
1895 | ✓ | urgent |
| 4 | 3 | src/specify_cli/cli/commands/implement.py |
718 | ✓ | urgent |
| 5 | 4 | src/specify_cli/cli/commands/merge.py |
1599 | ✓ | urgent |
| 6 | 6 | src/specify_cli/cli/commands/init.py |
1018 | ✓ | urgent |
| 7 | 10 | src/specify_cli/cli/commands/__init__.py |
115 | ✓ | urgent |
| 8 | 5 | src/specify_cli/sync/emitter.py |
1682 | ✓ | urgent |
| 9 | 7 | src/specify_cli/missions/software-dev/command-templates/specify.md |
635 | ✓ | urgent |
| 10 | 8 | src/specify_cli/glossary/middleware.py |
689 | ✓ | urgent |
| 11 | 9 | src/specify_cli/cli/commands/sync.py |
1462 | ✓ | urgent |
| 12 | 11 | src/specify_cli/missions/software-dev/command-templates/tasks.md |
674 | ✓ | urgent |
| 13 | 25 | src/specify_cli/dashboard/static/dashboard/dashboard.js |
1568 | ✓ | urgent |
| 14 | 12 | src/specify_cli/sync/events.py |
499 | ✓ | urgent |
| 15 | 21 | src/specify_cli/dashboard/scanner.py |
785 | ✓ | urgent |
| 16 | 13 | src/specify_cli/missions/software-dev/command-templates/plan.md |
359 | ✓ | urgent |
| 17 | 14 | src/specify_cli/missions/software-dev/command-templates/implement.md |
257 | ✓ | urgent |
| 18 | 15 | kitty-specs/041-mission-glossary-semantic-integrity/tasks/WP03-term-extraction-implementation.md |
255 | ✓ | urgent |
| 20 | 16 | src/specify_cli/next/runtime_bridge.py |
2552 | ✓ | urgent |
| 21 | 17 | src/specify_cli/glossary/__init__.py |
204 | ✓ | urgent |
| 23 | 18 | src/specify_cli/status/emit.py |
656 | ✓ | urgent |
| 24 | 22 | src/specify_cli/core/worktree.py |
681 | ✓ | urgent |
| 25 | 19 | tests/sync/test_events.py |
1211 | ✓ | urgent |
| 26 | 20 | kitty-specs/041-mission-glossary-semantic-integrity/tasks/WP11-type-safety-and-integration-tests.md |
925 | ✓ | urgent |
| 27 | 23 | src/specify_cli/cli/commands/charter.py |
2934 | ✓ | urgent |
| 28 | 24 | src/specify_cli/orchestrator_api/commands.py |
1097 | — | slow-burn |
| 29 | 26 | src/specify_cli/agent_utils/status.py |
570 | — | slow-burn |
Unsettled-burst residual (step-2 entries that did NOT make this intersection):
src/specify_cli/__init__.py(step-2 #1) — 132 commits all-time but only 13 in the recent four months. Settled: the 2.x cutover churn here is mostly done.src/specify_cli/cli/commands/agent/feature.py— deleted at HEAD (7428880c4, mission-id cutover). Drops out by HEAD filter; not a refactor target.src/specify_cli/dashboard.py— renamed tocli/commands/dashboard.py. Drops out by HEAD filter; not a refactor target. The renamed filecli/commands/dashboard.pydid not accumulate enough commits post-rename to enter either top-30.src/specify_cli/acceptance.py— renamed toacceptance/__init__.py. Drops out by HEAD filter; not a refactor target.src/specify_cli/tasks_support.py— 25 commits all-time, 17 in the recent window, but only 31 SLOC: it is a thin re-export shim. Real but unimportant.src/specify_cli/upgrade/migrations/__init__.py— 26 commits all-time, 13 recent, 89 SLOC: the migration registry. Real but small; churn-by-registration not churn-by-design.
None of the unsettled-burst residual is a "recent spike that hasn't
settled". All five either (a) have legitimately cooled (__init__.py,
tasks_support.py, migrations/__init__.py) or (b) have been
deleted/renamed and so dropped by the HEAD filter. The
unsettled-burst category is empty for spec-kitty under this
window choice — every hotspot the step-2 1y view surfaced is either
still hot in the four-month window or has been resolved (cut, merged,
or renamed).
Interpretation
The multi-window synthesis adds two new signals relative to the
existing audit. First, it confirms F2 (the cli/commands/agent/*
merge.py+sync/emitter.py+next/runtime_bridge.pyrefactor cluster): the top-five urgent candidates are exactly the F2 cluster the existing audit named, and they hold rank in both windows (F#2-5 and R#1-4). Twenty-four of the twenty-six intersection files are already known step-2 hotspots, so the audit's existing refactor priority list is stable and re-confirmed rather than displaced. Second, it surfaces two genuinely new slow-burn candidates not in the step-2 top-30:orchestrator_api/commands.py(1097 SLOC, F#28 / R#24) andagent_utils/status.py(570 SLOC, F#29 / R#26). Both fell just outside the step-2 cut (which stopped at rank 30 =acceptance.pywith 22 commits /agent_utils/status.pywas 21 in the src/-only table but absent from the full-corpus 30) — the multi-window view promotes them into the inspect-this-quarter bucket.orchestrator_api/commands.pyalready appears in the open follow-ups list as item 5's neighbour, butagent_utils/status.py(the kanban renderer with an F-53_display_status_boardfunction) is newly elevated to detailed-inspection status by this pass.
The unsettled-burst category is empty: every step-2 entry that dropped out of this intersection did so for a settled reason (cooled churn, deletion, or rename) rather than a "recent burst that may not sustain". This is partly a consequence of the repository's youth — there has not been enough wall time for any 1y hotspot to settle into history — but it also means the existing F2 verdict is not threatened by hidden short-lived bursts. The list is what it claims to be.
- Scope (after 2026-05-09 expansion) =
src/ + tests/ + kitty-specs/. The remaining out-of-scope dirs arearchitecture/,docs/, and.github/. These are intentionally excluded (architecture docs and CI config are not what this audit is for). NoLifted by the 2026-05-09 expansion. See "Test coverage proxy" above; ratio-based proxy now exists, but it remains a churn proxy, not coverage.tests/overlay.clocnot installed. SLOC iswc -l(Python files only); language breakdown is implicit (everything in scope is Python except a few markdown templates and one JS dashboard file). For a pure-Python scope this is acceptable.- Rename-following is partial. Per-file authorship for the top-15 used
--follow; the bulk recipes (churn, bug-hotspots, temporal coupling) did not. Three concrete renames in the window are documented inline:acceptance.py→acceptance/__init__.py,dashboard.py→cli/commands/dashboard.py,cli/commands/agent/feature.pydeleted in7428880c4. Other renames likely exist and would push some files currently outside the top-30 closer to the top. - Bug-grep heuristic.
fix|bug|broken|regress|hotfixis the recipe- spec'd grep. It matches commit body too, producing a low rate of false positives (spec/feat commits whose body mentions "fix" in prose). Spot-checked on 5 random matches: 4/5 are genuine fix commits, 1/5 is a spec commit that mentions opportunistic fixes in passing. Acceptable noise floor. - Squash-merge mix. History contains both squashed PRs (one commit each) and non-squashed PRs (multiple commits). Velocity counts are biased slightly downward against squashed PRs but the bias is uniform across the window so the shape of the velocity curve is reliable.
- The "vanity-file dominance" check is informal. Top-4 hotspots were spot-checked for insertion-vs-deletion ratio (all show substantive churn). The full top-30 was not checked individually.
- DDD classifications are researcher-tentative — architect sign-off required before treating any cell of the DDD column in the hotspot table as authoritative. The justifications are one-line and the researcher has no charter context to lean on.
- Per the tactic's own out-of-scope clause: this audit is not a substitute for the brownfield-investigation skill (issues #665/#666); it is the evidence-gathering input to triage and connascence/premortem tactics, not a verdict.
- Mission state-file dominance (full-corpus run). Without the new
kitty-specs/**/tasks.md,kitty-specs/**/snapshot-latest.json, andkitty-specs/**/dossiers/**exclusions, mission-state files completely dominate churn (some snapshot-latest.json files have 24+ commits; some tasks.md files have 100+). Two such files leaked into the first uncleaned pass and were excluded in the published table. A future re-run on a fresh clone should re-validate this exclusion list — new state-file conventions may have been introduced. - Mission ↔ src coupling visibility (commit-level). As discussed in the
cross-cutting section, the SDD pipeline deliberately separates spec edits
from code edits at the commit level, so commit-pair recipes only see the
exception cases (post-merge spec amendments, doc fixes during
implementation). The full mission ↔ code coupling would require either
PR-level grouping (we don't have local PR metadata) or a
Mission:trailer convention (not yet adopted). This is a methodology gap, not a fix-now item. - Test-coverage proxy is a churn ratio, not coverage. A file with
"test-heavy" status might still have low line/branch coverage; an
"under-tested" file might be exercised heavily by integration tests that
rarely change. The four flagged files (
agent/tasks.py,agent/workflow.py,glossary/middleware.py,agent_utils/status.py) are prompts to verify, not verdicts. The recipes do not includecoverage.pyor any actual coverage tool; that would be a separate audit step. - 150 missions, 167 mission directories with churn. The
kitty-specs/tree is enormous (150 directories at the time of audit). Most of these are merged, archived, and quiescent; ~15-20 show recent activity. The "mission ↔ src" pair counts are dwarfed by sheer mission volume; if a future scope expansion narrows to "active missions only" (open WPs or unmerged), the signal-to-noise should improve.
Open follow-ups for cross-check (Phase 3)
The following items should be cross-referenced against open #822 sub-issues
by the planner:
- Refactor
cli/commands/agent/tasks.py(3746 SLOC, F-160 + F-139 + F-87 + F-74). - Refactor
cli/commands/agent/mission.py(2314 SLOC, contains F-160finalize_tasksdespite the name, suggesting a misplaced responsibility). - Decompose
cli/commands/charter.py(2934 SLOC, three E-rated functions, MI=C). - Decompose
cli/commands/init.py(F-94init). - Examine
next/runtime_bridge.py(2552 SLOC, F-46) — bridge or hub? - Resolve the duplicated
task-prompt-template.mdpair (15 co-edits/y at two paths). - Knowledge-transfer plan for
agent/,merge.py,implement.py,sync/emitter.py,glossary/middleware.py, andcore/worktree.py(all ≥96% single-author). - Delete empty leftover dirs
src/runtime/,src/dashboard/,src/constitution/. - Re-run this audit with
tests/in scope to overlay test coverage on the F-rated functions before scheduling refactors. - Re-run this audit with
kitty-specs/in scope to surface feature-spec ↔ source-code temporal coupling, which is invisible to the current run.
Scope-expansion changelog (2026-05-09)
What changed in this re-run, with one-line rationale per item:
- Added
## Scope expansion (2026-05-09)near the top — declares the expanded scope, exclusions, and headline impact (F1 worse, F2 unchanged). - Added new exclusions for kitty-specs/ state churn (
status.events.jsonl,status.json,tasks.md,snapshot-latest.json,dossiers/**) — without these, mission state files dominate the churn ranking and obscure code hotspots. - Renamed the original hotspot-table heading to
### Hotspots — src/ only (original)— preserves the original table intact while making room for the broader view. - Added
### Hotspots — full corpus (src/ + tests/ + kitty-specs/)— shows that broadening scope adds zero new structural hotspots; mission 041 WP files andtests/sync/test_events.pyare the only non-src/ entries in the top 30, both of which track existing hotspots. - Added
### Bus factor — full corpus reassessmentinside the existing Bus-factor section — re-runsgit shortlogon the full corpus (95.2% Robert vs 89.5% in src/-only) and per-hotspot file+tests authorship (no improvement). Reclassifies F1 from "bus factor or maturity, unclear" to definitively "bus factor". - Added
## Cross-cutting temporal coupling (mission ↔ src/ ↔ tests/)— three sub-tables: (a) mission ↔ src pair counts (low signal, by design), (b) src ↔ tests pair counts (sync/ and glossary/ healthy, agent/ lagging), (c) F2 cluster change-with-tests rate (~30%, new finding F15). - Added
## Test coverage proxy— ratio table for top-20 hotspots, flagging four under-tested files (F2 cluster + glossary middleware + agent_utils status renderer). - Updated the Limitations section — removed the (now-invalid) scope-only-src/ caveats; added five new caveats (10–13) about state-file exclusions, mission ↔ src commit-level invisibility, churn-vs-coverage proxy nature, and the size of the kitty-specs/ tree.
- Updated the Triage matrix — added test-update-lag bullet to the "Important + urgent" bucket; upgraded the agent/tasks.py refactor priority to require a test build-out; added two glossary-middleware and agent_utils/status investigation bullets to "Important + not urgent".
- Added this changelog at the bottom.
- Added (2026-05-11)
## Multi-window refactor-candidate synthesis (2026-05-11)between the cross-cutting observations and the limitations sections — executes the new "Compile a multi-window refactor-candidate list" step added to theforensic-repository-audittactic; uses a four-month velocity-adjusted window (accelerating regime per the existing velocity series), confirms F2, surfacesorchestrator_api/commands.pyandagent_utils/status.pyas new slow-burn candidates, and reports an empty unsettled-burst category.
The original audit-metadata, methodology, top-findings, hotspot table, temporal-coupling table (within src/), bus-factor overall table, firefighting section, velocity section, cross-cutting observations, and open follow-ups are preserved verbatim from the 2026-05-08 run. Only the eight items listed above were modified or added.
Generated 2026-05-08 by researcher subagent following the
forensic-repository-audit
tactic and
legacy-codebase-triage
procedure on branch feat/caacs-doctrine at commit
bc64dec6ee37dbbd6bc21a0a1aa3195f2bab1b57. Scope-expansion re-run on
2026-05-09 at commit 81883352240c3f8e0249b78875f7fa140700418f extended the
audit to src/ + tests/ + kitty-specs/; see "Scope expansion (2026-05-09)"
near the top and the changelog at the bottom for the diff. Not committed;
review-only artifact.