Code Review — 2026-03-25
| Field | Value |
|---|---|
| Version | 2.1.2 |
| Date | 2026-03-25 |
| Reviewer | OpenCode (claude-sonnet-4.6) |
| Scope | Full source tree (src/) against architecture/2.x |
| Tools | ruff, mypy --strict, pytest, manual architectural trace |
Summary
| Severity | Count | Category |
|---|---|---|
| Critical | 3 | Real bugs: dropped kwarg, type mismatch, dashboard blocking retry loop |
| High | 14 | Broken __all__ exports, undefined names, None-dereference |
| Medium | ~104 | Lost tracebacks (65), complexity violations (39) |
| Low / Style | 823 | Auto-fixable: deprecated typing, f-string, unused imports |
| Architecture | 5 | Boundary violations (4 documented, 1 undocumented) |
| Canon | 5+ | Feature terminology in non-deprecated user-facing surfaces |
1. Critical Bugs
C1 — Dropped --mission argument on spec-kitty specify
File: src/specify_cli/cli/commands/lifecycle.py:33
# Current — WRONG
agent_feature.create_feature(mission_slug=slug, mission=mission, json_output=json_output)
create_feature (defined at cli/commands/agent/feature.py:501) has no mission kwarg.
Its parameters are mission_type and mission_legacy.
The result: every call to spec-kitty specify --mission <type> silently drops the mission
type. The feature is created with no mission type, falling back to whatever default the
resolver provides. mypy confirms: error: Unexpected keyword argument "mission" for "create_feature".
C2 — Type mismatch in merge/executor.py
File: src/specify_cli/merge/executor.py:596–614
# Line 596 — no text=True → CompletedProcess[bytes]
result = subprocess.run(["git", "rev-parse", ...], capture_output=True, check=False)
# Line 606 — REASSIGNS to text=True → CompletedProcess[str]
result = subprocess.run(["git", "status", "--porcelain"], ..., text=True, encoding="utf-8")
# Line 614 — reads .stdout as str, but variable declared as CompletedProcess[bytes]
if result.stdout.strip():
Both runs reuse the same result variable. The first run is typed as CompletedProcess[bytes]
by mypy, the second reassigns it to CompletedProcess[str]. The .stdout.strip() call on
line 614 operates on the second (str) result and works at runtime, but the type contract is
broken and the variable reuse obscures which result is being checked where.
C3 — Dashboard retry loop blocks on dead process; test timeout
File: src/specify_cli/dashboard/lifecycle.py:364–369
retry_delays = [0.1] * 10 + [0.25] * 40 + [0.5] * 20 # ~20 seconds total
for delay in retry_delays:
if _check_dashboard_health(port, project_dir_resolved, token):
...
return url, port, True
time.sleep(delay)
The process liveness check (_is_process_alive(pid)) only runs after the full 20-second
retry loop completes. When the dashboard process dies immediately after startup, the loop
still runs to completion before reporting failure.
This causes the failing test
test_health_timeout_with_dead_process_still_fails to time out (>10 s) inside
time.sleep. It also means real users wait 20 seconds before seeing a startup failure.
2. High Severity
H1 — Broken __all__ exports in glossary/models.py
Files: src/specify_cli/glossary/models.py and 7 downstream modules
models.py does not export SemanticConflict, Severity, ConflictType, SenseRef, or
TermSurface via __all__. mypy reports attr-defined errors propagating through:
glossary/strictness.py(3 errors)glossary/scope.py(1 error)glossary/checkpoint.py(1 error)glossary/exceptions.py(2 errors)glossary/resolution.py(1 error)glossary/pipeline.py(3 errors)glossary/conflict.py(5 errors, includingName "models.SemanticConflict" is not defined)glossary/attachment.py(1 error)glossary/__init__.py(9 errors)
The entire glossary subsystem has broken type exports. In strict mypy this is 27 errors
from a single missing __all__.
H2 — Undefined names (F821/F822)
| File | Name | Risk |
|---|---|---|
cli/commands/agent/tasks.py:726 |
Any used in "Any \| None" annotation |
NameError if annotation evaluated at runtime |
cli/commands/sync.py:39 |
timedelta in "timedelta" annotation |
Same risk |
core/agent_config.py:26 |
Path used in annotation without import |
Same risk |
scripts/tasks/acceptance_support.py:55–66 |
__all__ declares 10 names populated by globals().update() |
Breaks static analysis; any downstream from acceptance_support import * is opaque |
H3 — Potential None dereference
File: src/specify_cli/merge/status_resolver.py:492
Expression type is str | None, variable type is str. No None-guard before use. mypy:
error: Incompatible types in assignment.
3. Medium Severity
M1 — Lost exception tracebacks (65 instances, B904)
Every CLI command error handler pattern across cli/commands/ raises typer.Exit(1) bare
inside an except block:
except SomeError as exc:
console.print(f"[red]Error:[/red] {exc}")
raise typer.Exit(1) # loses exc as __cause__
The original exception is not chained. When a user reports an error with a log, the
traceback has no cause chain to exc. Use raise typer.Exit(1) from exc or
raise typer.Exit(1) from None (if intentional suppression).
M2 — Cyclomatic complexity violations (39 functions, C901)
The project threshold is 15. Worst offenders:
| Function | Complexity | File |
|---|---|---|
merge |
75 | cli/commands/agent/feature.py |
run_enhanced_verify |
68 | verify_enhanced.py |
move_task |
63 | cli/commands/agent/tasks.py |
implement (workflow) |
57 | cli/commands/agent/workflow.py |
merge_workspace_per_wp |
49 | cli/commands/agent/feature.py |
map_requirements |
47 | requirement_mapping.py |
execute_merge |
36 | merge/executor.py |
review |
32 | cli/commands/agent/workflow.py |
implement (cmd) |
31 | cli/commands/implement.py |
status |
30 | cli/commands/agent/workflow.py |
merge at complexity 75 is effectively untestable by mutation testing. The mutmut config
does not cover cli/commands/agent/feature.py, meaning regressions in merge semantics are
undetected by the mutation gate.
M3 — Silent try/except/pass (30 instances, S110)
Notable cases:
cli/commands/agent/feature.py:749— auto-commit failure is silently swallowed. A failed commit produces no observable signal to the caller.cli/commands/accept.py:32— event emission failure suppressed entirely.
4. Low / Style (Auto-fixable)
| Rule | Count | Category |
|---|---|---|
| UP045 | 186 | Optional[X] → X \| None |
| UP006 | 134 | List[X], Dict[X,Y] → list[X], dict[X,Y] |
| F541 | 109 | f-strings without placeholders (f"Progress: ") |
| F401 | 85 | Unused imports |
| UP035 | 43 | Deprecated typing imports |
| UP017 | 36 | timezone.utc → datetime.UTC |
| SIM10x | 33 | Collapsible conditions, reimplemented builtins |
| UP037 | 10 | Quoted annotations |
All 592 flagged as [*] are auto-fixable with ruff --fix src/.
Additionally, manifest.py:92 contains a logic error:
if not in_frontmatter and in_frontmatter == False:
The second clause is a tautology of the first. The condition is equivalent to
if not in_frontmatter. This is either dead code or a miswritten transition guard.
5. Architecture Divergences vs. architecture/2.x
Four divergences are already documented in docs/architecture/04_implementation_mapping/README.md.
One is not.
DIV-1 — CLI bypasses Control Plane → Orchestration boundary (documented)
lifecycle.py:33 calls agent_feature.create_feature() directly from the Control Plane
layer without routing through Orchestration. Multiple handlers in cli/commands/agent/feature.py
write directly to kitty-specs/ without going through status/emit.py.
DIV-2 — Kitty-core and Orchestration both write kitty-specs/ (documented)
Planning artifacts (spec.md, plan.md, meta.json) are written directly at lines 600–810 of
cli/commands/agent/feature.py, bypassing the Event Store interface contract.
DIV-3 — Connector concept is implicit (documented)
No formal Connector interface. Current "adapter" is a rendered markdown template. No
dispatch mechanism abstraction exists for SDK, shell, or remote API alternatives.
DIV-4 — Dashboard reads filesystem directly (documented)
dashboard/scanner.py reads WP frontmatter directly rather than querying through the
Event Store interface.
DIV-5 — doctrine has undocumented reverse dependency on specify_cli (NOT documented)
The architecture/2.x documents state doctrine has "zero dependency on specify_cli".
This is false:
src/doctrine/missions/glossary_hook.py:83lazy-importsfrom specify_cli.glossary.attachment import GlossaryAwarePrimitiveRunnerat call time.src/charter/catalog.py:25lazy-importsfrom specify_cli.runtime.home import get_package_asset_rootat call time.
The glossary_hook.py case is partially addressed in ADR 2026-03-25-1-glossary-type-ownership
and pr305-review-resolution-plan.md (Track 2), but the fix is not yet merged. The
charter/catalog.py reverse dependency has no ADR and is not mentioned in the
implementation mapping.
6. Terminology Canon Violations
Per AGENTS.md: canonical term is Mission; Feature is prohibited in non-deprecated
user-facing language.
Active violations in user-visible help strings:
| File | Line | Violation |
|---|---|---|
cli/commands/lifecycle.py |
27 | help="Feature name or slug (e.g., user-authentication)" |
cli/commands/agent/feature.py |
49 | help="Feature lifecycle commands for AI agents" |
cli/commands/agent/feature.py |
503 | help="Feature slug (e.g., 'user-auth')" |
cli/commands/research.py |
44 | tracker.add("feature", ...) — internal label, surfaces to users |
cli/commands/implement.py |
563 | Example slug "001-my-feature" in --mission help — perpetuates the term |
7. Security Notes
| Rule | File | Assessment |
|---|---|---|
| S608 | sync/queue.py:442,461,517,523 |
False positive — parameterised ? placeholders are safe. Add # noqa: S608 with justification comment. |
| S310 | dashboard/lifecycle.py:146,231,460,475,482 |
All target http://127.0.0.1:<port> — localhost only, low risk. httpx (already a dependency) would be a cleaner replacement. |
| S108 | dossier/tests/test_api.py, test_indexer.py |
Hardcoded /tmp/ in tests — non-portable on Windows. Use tmp_path pytest fixture. |
8. mypy Quarantine Scope
The [tool.mypy.overrides] ignore_errors = true block in pyproject.toml covers 74 modules —
the majority of production specify_cli code. Strict mode is effectively inactive for most
of the codebase. The quarantine comment says "Transitional", but the list has grown with
recent features (e.g., mission_v1, dossier, orchestrator_api) rather than shrinking.
Test Results
1701 passed, 7 skipped, 18 xfailed, 90 warnings
1 FAILED: test_health_timeout_with_dead_process_still_fails (C3 above)
90 warnings include 18 DeprecationWarning: --mission is deprecated; use --mission instead
triggered by tests that still use the deprecated --mission flag.