Mission 079: CI Hardening and Lint Cleanup

Mission type: software-dev Status: draft Target branch: main Architect assessment: work/079-alphonso-assessment.md Related issues: #548 (path-based CI filtering), #549 (per-module CI job split)


Overview

The spec-kitty codebase currently carries three classes of accumulated technical debt that each raise the cost of every pull request:

1. Static analysis violations — ruff style/quality violations and mypy type errors are reported by CI on every run but never fixed. They accumulate noise in CI feedback, make the lint gate meaningless, and obscure new regressions introduced by contributors.

2. Monolithic CI test jobsfast-tests-core and integration-tests-core run the entire src/specify_cli/ surface as a single undifferentiated unit. A one-line change to the dashboard triggers the merge suite, the sync suite, the review suite, and more. Every contributor pays the full cost regardless of what they touched.

3. Lack of CI path awareness — CI jobs run unconditionally on every PR, even when only documentation files changed. A 21-line ADR commit consumed 4+ minutes of CI time across jobs that had zero relevance to the change.

This mission resolves all three problems in dependency order: lint/type correctness first, then CI structure, then CI path filtering.


Problem Statement

Lint and Type Violations

The codebase has open ruff and mypy violations that CI reports but does not enforce. This means:

the tests and the current MissionDossier/ArtifactRef dataclass signatures)

errors, creating false confidence

  • The lint job is advisory, not authoritative — contributors cannot trust it as a quality gate
  • New violations introduced by PRs are masked in a sea of pre-existing noise
  • The dossier test suite is testing the wrong data shape (mypy reveals schema drift between
  • Several files contain stale # type: ignore suppressions that no longer correspond to real

Monolithic Test Jobs

All modules in src/specify_cli/ are tested as one unit. This produces:

  • Slow CI feedback for targeted changes (a reviewer changing one file waits for 12 unrelated modules)
  • No path to per-module filtering because there are no per-module targets to filter against
  • No per-module coverage floors, so a module with 0% coverage can hide behind a high-coverage neighbor

Missing Path Awareness

CI runs unconditionally. This produces:

protection settings to avoid blocking legitimate PRs

  • Full suite execution on documentation-only commits
  • No ability to skip the Python test runner for Markdown/ADR changes
  • Required checks defined against monolithic jobs — any future split must coordinate with branch

Goals

1. Bring the codebase to a clean ruff + mypy baseline so the lint gate is authoritative 2. Split CI test jobs along functional module boundaries so each module has its own fast + integration job pair 3. Add path-aware triggering so documentation-only commits bypass Python test execution 4. Ensure all changes respect internal module dependency ordering (modules that depend on others must not be split in an order that breaks their dependency chain) 5. Coordinate branch protection required-check updates so no PR is blocked during or after the transition


Actors

ActorRole
ContributorDeveloper opening a PR; affected by CI job scope and feedback quality
ReviewerApproves PRs; relies on CI gate accuracy to know what was checked
MaintainerManages branch protection rules and required checks
CI systemExecutes jobs; subject to path filter and module split configuration

User Scenarios

Scenario A — Contributor opens a code PR touching one module

A contributor changes a file in src/specify_cli/merge/. Under the current system, all 12+ module clusters run. After this mission:

  • Only the merge module's fast + integration jobs run automatically
  • The contributor gets focused feedback within the merge module's coverage scope
  • If merge depends on review and next, those jobs also run (dependency-ordered triggering)
  • Unrelated modules (dashboard, sync, release) are skipped

Scenario B — Contributor opens a docs-only PR

A contributor adds an ADR or updates docs/. Under the current system, the full test suite runs. After this mission:

  • Python test and lint jobs are skipped
  • Only markdownlint and commitlint run
  • Required checks are satisfied via skip-pass shim jobs so branch protection is not blocked

Scenario C — Reviewer reads the lint job output

A reviewer looks at the lint CI job for a new PR. Under the current system, the output contains hundreds of pre-existing violations masking any new ones. After this mission:

  • The ruff and mypy reports contain only violations introduced by the PR under review
  • The lint gate is authoritative: a clean run means the code is clean

Scenario D — Contributor adds a new test to an existing module

A contributor adds a test to tests/review/. The per-module job for review runs. Coverage for that module is computed independently against a calibrated floor for the review module.


Requirements

Functional Requirements

IDRequirementStatus
FR-001All ruff violations present on main at mission start must be resolved: ARG001 in catalog.py, SIM108 in resolver.py, B009 in glossary_hook.py, SIM105 in _safe_re.py, E402/UP035/F401 cluster in acceptance.pyproposed
FR-002All mypy errors present on main at mission start must be resolved: dossier test schema drift (call-arg mismatches in test_snapshot.py), stale # type: ignore comments across 8 files, bare generic types in state_contract.py, acceptance_matrix.py, doctrine/missions/repository.py, and migration files, no-any-return violations in version_utils.py, upgrade/feature_meta.py, policy/audit.py, and doctrine/missions/repository.py, missing return type annotations in sync/config.py, type incompatibility in merge/conflict_resolver.py, and incompatible assignment in cli/commands/materialize.pyproposed
FR-003The requests library type stubs must be added as a development dependency so mypy can type-check sync/body_transport.py without errorsproposed
FR-004The CI workflow must define a dedicated fast-tests job and integration-tests job for each of the following module clusters: dashboard, merge, review, sync, next, lanes, upgrade, missions, cli, post_merge, release, orchestrator_api, and a residual core-misc clusterproposed
FR-005Each per-module CI job must scope its test invocation only to that module's source and test paths; it must not run tests for other modulesproposed
FR-006Each per-module CI job must have an independently configured coverage floor appropriate to that module's current coverage levelproposed
FR-007The CI job DAG must express the verified import-graph dependency ordering between module clusters. The authoritative structure (from research.md R-01) is: sync (Tier 0, leaf) → status (Tier 1) → {review, next, lanes, dashboard, upgrade} (Tier 2) → {cli, orchestrator_api} (Tier 3). merge is a Tier 0 leaf node with no dependencies on next or review. All DAG edges must be derived from the verified import graph, not assumed.proposed
FR-008The monolithic fast-tests-core and integration-tests-core jobs must be replaced by the per-module jobs; they must not run alongside the new jobsproposed
FR-009The report and quality-gate jobs must be updated to collect results from all new per-module jobsproposed
FR-010A documentation-only PR (changes limited to .md, architecture/, docs/, kitty-specs/*) must not trigger Python test jobs, ruff, or mypyproposed
FR-011A documentation-only PR must still trigger markdownlint and commitlintproposed
FR-012A PR touching only src/kernel/ must not trigger per-module jobs for other modulesproposed
FR-013Skip-pass shim jobs must be added for any per-module job that is a required branch-protection check, so that PRs where those jobs are skipped are not blockedproposed
FR-014The orchestrator-boundary.yml workflow must add a path filter so it only runs when src/specify_cli/orchestrator_api/** is touchedproposed
FR-015The check-spec-kitty-events-alignment.yml workflow must add a path filter so it only runs when src/specify_cli/sync/**, pyproject.toml, or the events package version changesproposed
FR-016All test files in per-module test directories must have at least one pytest marker applied (fast, git_repo, slow, integration, or e2e); unmarked test functions must be catalogued before the CI job split so coverage accounting is accurateproposed
FR-017Tests in tests/next/ and tests/missions/ currently marked git_repo that do not require a real git repository (i.e., they only need filesystem isolation) must be identified; any confirmed shift-left candidate must be re-marked fast and refactored to remove the git subprocess dependencyproposed

Non-Functional Requirements

IDRequirementThresholdStatus
NFR-001A documentation-only PR must not spend more than 2 minutes in CI (markdownlint + commitlint only)≤ 2 minutes total CI wall-clock timeproposed
NFR-002A PR touching a single module cluster must complete CI in under 5 minutes from push to all relevant jobs finishing≤ 5 minutes for single-module PRsproposed
NFR-003After the ruff/mypy cleanup (FR-001/FR-002), the lint job must report zero violations on main0 ruff violations, 0 mypy errors on mainproposed
NFR-004No existing passing test must be broken by the mypy fixes (dossier schema drift fixes must not change test behavior, only fix the call signatures to match the current data model)100% of previously passing tests remain passingproposed
NFR-005Per-module coverage floors must be set no lower than 2 percentage points below the actual current coverage for that module; the 2% buffer accounts for natural test-run variance while still functioning as a "do not regress" gatecoverage_floor(module) ≥ current_coverage(module) − 2%proposed

Constraints

IDConstraintStatus
C-001Branch protection required checks must remain satisfied throughout the migration; no PR must be blocked during or after the CI job splitproposed
C-002The lint fixes (FR-001/FR-002) must be delivered before the CI job split (FR-004/FR-008) so coverage floors are computed against a clean baselineproposed
C-003Module dependency order (FR-007) must be determined from runtime code inspection, not assumed; the implementation agent must verify import/call graphs before defining DAG edgesproposed
C-004Auto-fixable ruff violations must be fixed using the tool's own fix mode, not manual edits, to avoid introducing new issuesproposed
C-005# type: ignore removals must be verified to not re-introduce the original error (each removal must be tested with a mypy run targeting that file)proposed
C-006The requests stubs (FR-003) must be added only to the development dependency group, not the runtime dependenciesproposed
C-007The dossier test fixes (FR-002, schema drift) must align to the current production signatures of MissionDossier and ArtifactRef; they must not modify the production dataclasses themselvesproposed

Out of Scope

  • Bandit security issues (the 1127 Low + 23 Medium issues reported by bandit are tracked via Sonar and managed separately)
  • B608 SQL injection false positives in sync/queue.py (the parameterized queries are safe; this is annotated separately)
  • Full mypy --strict compliance across all modules not listed in FR-002 (broader mypy cleanup is a future effort)
  • CI performance beyond what the job split and path filtering deliver (no caching strategy changes, no parallelization beyond job DAG)
  • Modifying the production behavior of any runtime module (this mission is quality/CI infrastructure only)

Success Criteria

1. After all WPs are merged to main, ruff check src/ reports zero violations 2. After all WPs are merged to main, mypy --strict src/ reports zero errors for all files listed in FR-002 3. A docs-only PR (ADR markdown only) completes CI in under 2 minutes with only markdownlint and commitlint running 4. A PR touching only the dashboard module does not trigger the merge, sync, or review test jobs 5. The merge test job only starts after both review and next test jobs have passed 6. All currently passing tests continue to pass after the lint/type fixes are applied 7. The quality-gate job passes on a clean main after the full migration


Dependencies and Assumptions

External Dependencies

  • GitHub branch protection settings must be updated in tandem with CI job renames (owner coordination required before WP07 merge)
  • dorny/paths-filter action (or equivalent) available in the GitHub Actions marketplace

Internal Dependencies (Execution Order)

WP01 (ruff auto-fix)  ─┐
WP02 (E402 root cause) ─┤─→ WP06 (CI job split) ─→ WP07 (path filtering)
WP03 (dossier schema)  ─┤
WP04 (stale ignores)   ─┤
WP05 (bare generics)   ─┘

WP01–WP05 are independent of each other and can be executed in parallel (separate lanes). WP06 depends on WP01–WP05 being merged (clean baseline required for coverage floor calibration). WP07 depends on WP06 (path filters need per-module job targets to be meaningful).

Assumptions

  • A1: Module runtime dependency graph (for FR-007) can be determined by inspecting import relationships in src/specify_cli/. The implementation agent for WP06 must verify this before encoding DAG edges.
  • A2: Per-module coverage floors can be calibrated at low-but-honest values initially (e.g., the actual current coverage for that module) and tightened in a follow-up mission.
  • A3: The dossier/tests/test_snapshot.py errors are caused by schema drift in MissionDossier and ArtifactRef dataclasses, not logic errors in the test scenarios themselves. If inspection reveals the tests are testing deprecated behavior, that is out of scope and must be escalated.
  • A4: GitHub treats paths-filter skipped jobs as satisfying required checks. If a required check cannot be satisfied by a shim, it must be temporarily removed from branch protection before WP07 merges.