Contracts
canonicalization-rule-pipeline.md
Contract — Canonicalization Rule Pipeline
Mission: quality-devex-hardening-3-2-01KRJGKH Requirement: FR-011 Module: src/specify_cli/migration/canonicalization.py (new) Tactic: chain-of-responsibility-rule-pipeline — Transformer flavor
Purpose
Lift the implicit Transformer pipeline that currently lives flattened in _canonicalize_status_row onto a typed Protocol so that:
1. Each rule is independently testable as a pure function. 2. Rule ordering becomes explicit and auditable. 3. A second consumer (rebuild_state.py rule sequence) reuses the same Protocol — the two-consumer bar for an abstraction is met. 4. Future migration code (frontmatter migration, sync envelope canonicalization) has an opinionated shape to follow.
Public surface
# src/specify_cli/migration/canonicalization.py
from typing import Protocol, TypeVar, Generic, Sequence
from dataclasses import dataclass
State = TypeVar("State")
@dataclass(frozen=True)
class MigrationContext:
mission_slug: str
mission_id: str
line_number: int
generated_ids: list[str] | None = None
@dataclass(frozen=True)
class CanonicalStepResult(Generic[State]):
state: State
actions: tuple[str, ...] = ()
error: str | None = None
@classmethod
def passthrough(cls, state: State) -> "CanonicalStepResult[State]":
return cls(state=state, actions=(), error=None)
@dataclass(frozen=True)
class CanonicalPipelineResult(Generic[State]):
state: State | None
actions: tuple[str, ...]
error: str | None
class CanonicalRule(Protocol[State]):
def __call__(self, state: State, ctx: MigrationContext) -> CanonicalStepResult[State]:
...
def apply_rules(
rules: Sequence[CanonicalRule[State]],
state: State,
ctx: MigrationContext,
) -> CanonicalPipelineResult[State]:
"""Thread state through rules; short-circuit on the first error."""
accumulated_actions: list[str] = []
current = state
for rule in rules:
result = rule(current, ctx)
accumulated_actions.extend(result.actions)
if result.error is not None:
return CanonicalPipelineResult(
state=None,
actions=tuple(accumulated_actions),
error=result.error,
)
current = result.state
return CanonicalPipelineResult(
state=current,
actions=tuple(accumulated_actions),
error=None,
)
Invariants
- A rule is a pure function: no I/O, no globals, no in-place mutation of
statebeyond the returned value. - A rule that does not apply MUST return
CanonicalStepResult.passthrough(state). It does not raise to signal "not applicable". - A rule that detects a hard error MUST return
CanonicalStepResult(state=state, actions=..., error=<reason>). The runner short-circuits. - The
actionstuple is part of the contract — callers consume it for migration manifests / audit trails. MigrationContext.generated_idsis the one mutable element on the context, by design: rules append minted IDs so the caller observes them. Documented exception.
Composition rules
- Rules are declared as an ordered
tuple[CanonicalRule[State], ...]at module scope. Tuple, not list, to signal immutability. - Order is part of the contract: early-exit / sanity rules first, then renames / normalizations, then defaults, then validation.
- Adding or removing a rule is a localized change: only the rule body and the tuple entry. No re-reading of the whole pipeline.
Testing contract
Per function-over-form-testing + tdd-red-green-refactor:
1. Characterization tests first (before the refactor commit). Fixture rows drawn from .kittify/migrations/mission-state/ capture today's canonicalization output for the monolithic _canonicalize_status_row. The refactor commit MUST leave those tests green. 2. Per-rule unit tests (tests/unit/migration/test_canonicalization_rules.py). Parametrized (input_state, ctx, expected_result) triples per rule. Pure value-transformer tests — no structural assertions. 3. End-to-end pipeline tests (tests/integration/migration/test_canonicalization_pipeline.py). Realistic input fixtures; assert on final CanonicalPipelineResult.state and actions tuple. 4. No tests on the runner beyond the obvious short-circuit assertion. The runner is a 10-line function; testing rule interaction is the integration layer's job.
Migration of _canonicalize_status_row
Pre-refactor body (motivating example, ~80 lines):
def _canonicalize_status_row(data, *, mission_slug, mission_id, line_number, generated_ids=None):
if "event_type" in data or "event_name" in data:
return _CanonicalRowResult(row=None, actions=("quarantined_non_status_event",))
row = dict(data)
actions = []
for old, new in STATUS_ROW_ALIASES.items():
...
for key in sorted(FORBIDDEN_LEGACY_KEYS - {"feature_slug"}):
...
row["mission_slug"] = ...
row["mission_id"] = ...
if not _valid_event_id(row.get("event_id")):
...
if not row.get("at"):
...
if row.get("from_lane") is None:
...
if not row.get("to_lane"):
return _CanonicalRowResult(row=None, actions=tuple(actions), error="missing required to_lane")
if not row.get("wp_id"):
return _CanonicalRowResult(row=None, actions=tuple(actions), error="missing required wp_id")
for key in ("from_lane", "to_lane"):
...
return _CanonicalRowResult(row=row, actions=tuple(actions))
Post-refactor:
_RULES: tuple[CanonicalRule[Row], ...] = (
_rule_reject_non_status_event,
_rule_apply_aliases,
_rule_strip_legacy_keys,
_rule_stamp_identity,
_rule_mint_event_id,
_rule_default_at,
_rule_default_from_lane,
_rule_require_to_lane,
_rule_require_wp_id,
_rule_normalize_lanes,
)
def _canonicalize_status_row(data, *, mission_slug, mission_id, line_number, generated_ids=None) -> _CanonicalRowResult:
ctx = MigrationContext(
mission_slug=mission_slug,
mission_id=mission_id,
line_number=line_number,
generated_ids=generated_ids,
)
result = apply_rules(_RULES, dict(data), ctx)
return _CanonicalRowResult.from_pipeline(result)
Each _rule_* function is ~5–10 lines, pure, individually testable.
Reuse in rebuild_state.py
migration/rebuild_state.py contains an analogous rule sequence (different state shape — frontmatter dict instead of status-event row). The same Protocol covers it because State is generic. The two-consumer bar is met; the abstraction is justified per the rule-pipeline-pattern-survey.md recommendation.
Architectural catalog update
When this contract lands, the WP that introduces migration/canonicalization.py updates architecture/2.x/04_implementation_mapping/code-patterns.md entry 1 to cite the new module as the canonical Transformer-flavor implementation. The existing catalog entry already names this module as the planned canonical implementation; the update flips "planned" to "in-tree".
Out of scope for this contract
- Validator-flavor pipelines (existing in
audit/,charter_lint/). They use a different return type (list[Finding]) and a different runner. The unifying tactic notes describe both flavors; the Protocols stay distinct. - Scorer-flavor pipelines (existing in
agent_profiles/repository.py). One consumer only; not abstracted. - Generalization of the runner to a generic
RuleEnginecovering all three flavors. Rejected inwork/findings/rule-pipeline-pattern-survey.md— the shapes are different abstractions, not one.
stale-lane-auto-rebase-classifier-policy.md
Contract — Stale-Lane Auto-Rebase Classifier Policy (ADR Draft)
Mission: quality-devex-hardening-3-2-01KRJGKH Requirement: FR-006 (#771) Constraint: C-007 — ADR MUST land and be linked from the WP before implementation begins Module: src/specify_cli/merge/conflict_classifier.py (new), src/specify_cli/lanes/auto_rebase.py (new) Promotion path: This draft is promoted to architecture/2.x/adr/2026-05-14-1-stale-lane-auto-rebase-classifier-policy.md once the operator accepts the rules. Initial status: PROPOSED.
Context
spec-kitty merge currently fail-stops on stale lanes — i.e. when a lane branch has not incorporated changes from the mission branch that conflict with files the lane also touched. Pre-mission analysis (work/findings/771-auto-rebase-stale-lanes.md) documented a 30-minute rote-merge cost per 10-WP mission, all of which is on additive-only conflict shapes that a machine can resolve safely.
The user-facing risk of attempting to auto-merge is that a wrongly-classified semantic conflict silently combines incompatible code. The classifier MUST default to fail-safe: when no rule matches, halt and surface the conflict to the operator.
Decision
Adopt a closed-list rule classifier plus a fail-safe default. Each rule is keyed on:
1. File pattern (glob or specific path). 2. Conflict shape (what the conflict markers contain). 3. Resolution (the merged output the auto-rebase orchestrator writes back, plus an audit-log rule ID).
Any file or conflict shape not matching an explicit rule resolves to Manual and halts.
Rules (initial set)
R-PYPROJECT-DEPS-UNION
File pattern: pyproject.toml (top of repo). Conflict scope: [project.dependencies] array entries, [project.optional-dependencies.] arrays, [dependency-groups.] arrays. Conflict shape: Both sides added distinct entries (by package name) to the same array; no shared entry was modified. Resolution: Auto — union of entries, deduplicated by package name (case-insensitive), preserving the canonical TOML formatting (one entry per line, alphabetically sorted if the existing file is sorted; otherwise preserve insertion order — match the surrounding file's pattern). Counter-example: If both sides modified the version specifier on the same package (e.g. one side httpx >=0.27, the other httpx >=0.28), resolve to Manual — version conflicts are semantic.
R-INIT-IMPORTS-UNION
File pattern: /__init__.py (any package init). Conflict scope: The block of from X import Y / import X statements at the top of the file. Conflict shape: Both sides added distinct import lines (different X or different Y); no shared import was modified. Resolution: Auto — union of import lines, sorted by ruff after the union. The auto-rebase orchestrator runs ruff --fix --select I001 <file> after writing the merged content, treating any non-zero exit from ruff as a fallback to Manual. Counter-example**: If one side renamed an existing import target (e.g. from .auth import AuthFlow → from .auth import OAuthFlow), the rule does not match (it's a modify, not an add) — resolve to Manual.
R-URLS-LIST-UNION
File pattern: /urls.py (Django-style) or any file whose conflicting region is bounded by a recognizable list constant (_URLS = [, URL_PATTERNS = [, etc.). Conflict scope: Entries inside the list constant. Conflict shape: Both sides added distinct entries; no shared entry was modified. Resolution: Auto — union of entries, preserving the file's original ordering convention (alphabetical if sorted, insertion order otherwise). Counter-example**: If both sides modified the same entry's pattern or handler, resolve to Manual.
R-UVLOCK-REGENERATE
File pattern: uv.lock (exact path at repo top). Conflict scope: Any. Resolution mode: special — uv.lock is not classified as Auto/Manual for textual merge. Instead, the auto-rebase orchestrator:
1. Holds a global file lock via specify_cli.core.file_lock to prevent concurrent regenerations across lanes. 2. Discards both sides of the conflict (the file is fully regenerated). 3. Runs uv lock --no-upgrade from the repo root. 4. Commits the regenerated uv.lock.
If uv lock exits non-zero, the orchestrator halts with the stderr surfaced to the operator.
R-DEFAULT-MANUAL
File pattern: any file not matched by the rules above. Conflict scope: any. Resolution: Manual with reason="no classifier rule matched <file_path>".
This rule is always last in the rule list. It is the fail-safe default mandated by NFR-005.
Rule list ordering
RULES: tuple[ClassifierRule, ...] = (
R_PYPROJECT_DEPS_UNION,
R_INIT_IMPORTS_UNION,
R_URLS_LIST_UNION,
R_UVLOCK_REGENERATE, # special-cased in the orchestrator
R_DEFAULT_MANUAL,
)
First match wins. R_DEFAULT_MANUAL is always reachable because no preceding rule has an unbounded pattern.
Fail-safe invariants (NFR-005)
1. Any input not exactly matching one of the named rules MUST resolve to Manual. 2. A rule MUST resolve to Manual if its conflict shape predicate raises ANY exception during evaluation. The classifier wraps each rule's shape predicate in a try/except that defaults to Manual on raise. 3. The orchestrator MUST verify, after applying an Auto resolution, that the resulting file is syntactically valid for its type. For pyproject.toml: tomllib.loads succeeds. For Python files: ast.parse succeeds. If validation fails, the orchestrator reverts the file to its pre-merge state and reports Manual(reason="post-merge validation failed: ...").
Operator-visible behavior
When all conflicts in a lane resolve to Auto
The orchestrator:
1. Applies each Auto resolution by writing the merged text and staging the file. 2. Runs the orchestrator's post-merge step (uv lock if uv.lock was conflicted; ruff --fix --select I001 if any __init__.py was conflicted). 3. Creates a merge commit on the lane branch with message "auto-rebase: <N> conflicts resolved by classifier rules [R-PYPROJECT-DEPS-UNION, ...]". 4. Continues the outer merge pipeline as if the lane had been merged cleanly.
When any conflict in a lane resolves to Manual
The orchestrator:
1. Reverts any partial auto-resolutions in the lane worktree (git merge --abort). 2. Halts the outer merge pipeline. 3. Emits the same actionable error message that spec-kitty merge emits today: instructs the operator to run git merge <mission-branch> in the lane worktree and resolve manually. 4. Reports per-lane status in AutoRebaseReport.classifications for any future audit.
When uv.lock regeneration fails
The orchestrator:
1. Aborts the lane merge. 2. Surfaces the uv lock stderr to the operator. 3. Records the failure in AutoRebaseReport.halt_reason. 4. Does NOT retry — operator intervention required (likely a pyproject.toml issue that survived R-PYPROJECT-DEPS-UNION).
Examples
Example 1: R-PYPROJECT-DEPS-UNION (auto-resolve)
Lane A's pyproject.toml:
[project]
dependencies = [
"httpx>=0.27",
"ruamel-yaml",
]
Lane B's pyproject.toml:
[project]
dependencies = [
"httpx>=0.27",
"freezegun",
"ruamel-yaml",
]
Mission branch's pyproject.toml (after Lane A merged):
[project]
dependencies = [
"httpx>=0.27",
"ruamel-yaml",
"requests-mock",
]
Lane B is stale; conflict on the dependencies array. R-PYPROJECT-DEPS-UNION matches.
Auto-resolved result:
[project]
dependencies = [
"freezegun",
"httpx>=0.27",
"requests-mock",
"ruamel-yaml",
]
(Sorted because the existing file in this example was sorted — match the surrounding pattern.)
Example 2: Counter-example — version specifier conflict (Manual)
Lane A adds httpx>=0.27; Lane B adds httpx>=0.28. R-PYPROJECT-DEPS-UNION does NOT match (the rule's shape predicate excludes same-package version drift). Resolves to Manual. The orchestrator halts and the operator decides.
Example 3: R-INIT-IMPORTS-UNION (auto-resolve)
Lane A's apps/collaboration/__init__.py:
from .auth import AuthFlow
from .flags import FeatureFlags
Lane B's apps/collaboration/__init__.py:
from .flags import FeatureFlags
from .sync import SyncClient
R-INIT-IMPORTS-UNION matches.
Auto-resolved result (after ruff --fix --select I001):
from .auth import AuthFlow
from .flags import FeatureFlags
from .sync import SyncClient
Example 4: Counter-example — modification of an existing import (Manual)
Lane A changes from .auth import AuthFlow to from .auth import OAuthFlow. Lane B adds from .sync import SyncClient. R-INIT-IMPORTS-UNION does NOT match because Lane A modified an existing import (not added a new one). Resolves to Manual.
Testing contract
Per function-over-form-testing:
- Per-rule unit tests (
tests/integration/merge/test_conflict_classifier.py): parametrized(file_path, hunk_text, expected_resolution)triples for each rule. Cover both happy auto-resolve and the rule's counter-example. - Orchestrator integration tests (
tests/integration/lanes/test_auto_rebase_additive.py): two-lane scenario withpyproject.toml+__init__.pyadds; assert the outer merge pipeline completes; assert the resultingpyproject.tomlparses as TOML and contains the union of dependencies. - Negative integration tests: two-lane scenario with a semantic conflict; assert the orchestrator halts with the current actionable error message; assert no partial auto-resolution leaks to the lane worktree.
- Fail-safe smoke: feed the classifier a file pattern not covered by any rule; assert
R-DEFAULT-MANUALfires with the documented reason.
Open questions for the operator
(Before promoting this draft to the canonical ADR file under architecture/2.x/adr/.)
1. Should the auto-rebase commit message cite the specific lane being rebased? Yes — the message format includes lane=<id> to aid post-merge audit. 2. Should ruff --fix --select I001 be expanded to additional rule sets (e.g. --select E,F)? No — broaden only if operator-experience shows the import-only fix is insufficient. Keep minimal. 3. Should the R-URLS-LIST-UNION rule attempt to detect the file's sort convention (alphabetical vs insertion order)? Yes — sample the unmodified prefix of the list; if sorted, sort the union; otherwise preserve relative order. This is part of the rule's implementation; document it here so reviewers know it's deliberate.
Status
PROMOTED. Canonical reference: architecture/2.x/adr/2026-05-14-1-stale-lane-auto-rebase-classifier-policy.md (status: ACCEPTED, 2026-05-14). The three open questions above were resolved per the Pedro recommendations recorded in the ADR's "Resolved questions" section. This draft is retained for historical traceability only; all future amendments belong on the ADR.
upgrade-probe-and-notifier.md
Contract — Upgrade Probe and Notifier
Mission: quality-devex-hardening-3-2-01KRJGKH Requirement: FR-007 (#740) Modules: src/specify_cli/core/upgrade_probe.py + src/specify_cli/core/upgrade_notifier.py (both new) Tactic: secure-design-checklist — applied to the new external surface (PyPI probe)
Purpose
Surface "no upgrade available" / "you are on the latest supported version" / "build channel with no upgrade path" information to the user without:
- blocking the CLI on network IO,
- emitting noisy notifications on every command invocation,
- breaking the existing hard CLI/project version-mismatch error path.
External surface
PyPI probe endpoint
- URL:
https://pypi.org/pypi/spec-kitty-cli/json(PyPI's standard JSON metadata endpoint). - Method:
GET. - Auth: none.
- Timeout: 2 seconds (hard cap). Any timeout, connection error, or non-2xx response resolves to
UpgradeChannel.UNKNOWN. - User-Agent:
spec-kitty-cli/<version> (https://github.com/Priivacy-ai/spec-kitty).
Response handling
PyPI's /pypi/<package>/json returns:
{
"info": { "version": "3.1.8", ... },
"releases": { "0.1.0": [...], ..., "3.1.8": [...] }
}
The probe:
1. Reads info.version as the latest published release. 2. Reads releases as the set of all known releases for inclusion checks. 3. Classifies channel per the table below.
Channel classification
| Condition | UpgradeChannel |
|---|---|
installed_version parses as a PEP 440 version AND equals info.version | ALREADY_CURRENT |
installed_version parses AND is greater than info.version per PEP 440 ordering | AHEAD_OF_PYPI |
installed_version parses AND is not present in releases keys | NO_UPGRADE_PATH |
installed_version parses AND is present in releases keys but older than info.version | UPGRADE_AVAILABLE |
| Probe failed (timeout / HTTP error / parse error / malformed response) | UNKNOWN |
Cache contract
Location
- POSIX:
~/.cache/spec-kitty/upgrade-check.json - Windows:
%LOCALAPPDATA%\spec-kitty\upgrade-check.json
Schema
{
"installed_version": "3.2.0rc7",
"latest_pypi_version": "3.2.0rc7",
"channel": "already_current",
"probed_at": "2026-05-14T05:50:00+00:00",
"error": null,
"ttl_seconds": 86400
}
TTL
- Successful probes (channel ≠
UNKNOWN):ttl_seconds = 86400(24 h). - Failed probes (channel =
UNKNOWN):ttl_seconds = 3600(1 h).
Cache freshness check
A cache entry is fresh iff:
now - probed_at < ttl_seconds
AND installed_version == get_cli_version()
If the installed version changed (e.g. user upgraded between invocations), the cache is treated as stale even within the TTL window.
Failure modes
- Cache file missing: treat as cache miss; probe.
- Cache file unparseable: treat as cache miss; probe.
- Cache write fails (disk full, permission denied): log to debug; do not raise. The user gets a notice this invocation but not the next; behavior is not broken.
Opt-out
Environment variable SPEC_KITTY_NO_UPGRADE_CHECK=1 disables the probe entirely:
- The probe is not invoked.
- The cache is not read or written.
- No notice is emitted on any channel.
- This is separate from the hard CLI/project version mismatch error path (FR-007 AC #5), which remains unconditionally active.
Notifier contract
def maybe_emit_upgrade_notice(
cli_version: str,
*,
console: Console | None = None,
now: datetime | None = None,
cache_path: Path | None = None,
) -> bool:
"""
Returns True if a notice was emitted; False otherwise.
Steps:
1. Check SPEC_KITTY_NO_UPGRADE_CHECK; if set, return False.
2. Load cache; if fresh, use cached UpgradeProbeResult.
Otherwise call probe_pypi(cli_version, timeout_s=2.0).
3. If channel == ALREADY_CURRENT and previous cache entry within TTL
was also ALREADY_CURRENT, suppress notice (return False).
4. Render the channel-appropriate notice via the console.
5. Persist the result to cache (best-effort).
"""
Notice messages
ALREADY_CURRENT:[dim]spec-kitty-cli {version} — you are on the latest supported version.[/dim]AHEAD_OF_PYPI:[dim]spec-kitty-cli {version} — build is ahead of the latest PyPI release ({latest}). No upgrade required.[/dim]NO_UPGRADE_PATH:[dim]spec-kitty-cli {version} — installed from a non-PyPI build/channel. No PyPI upgrade path is available.[/dim]UPGRADE_AVAILABLE: no no-upgrade notice. The existing upgrade nag owns the actionable upgrade prompt.UNKNOWN: no notice. (The user is not blocked by inability to probe.)
Performance contract (NFR-004)
- Cache-warm path: ≤ 100 ms wall-clock from invocation to return. Measured on the dev machine; recorded in WP evidence.
- Cold-cache path: up to 300 ms permitted once per 24 h when the cache is missing or stale. Subsequent invocations within the TTL window read from cache and meet the 100 ms budget.
- Network unavailable: probe times out at 2 s, falls through to
UNKNOWN, returns. The 2-second worst case occurs at most once per 1 h cache window. - Recommendation: do not invoke the notifier from the hot startup path; gate it behind
should_check_version()so non-interactive commands (e.g. CI-only commands) skip it.
Integration with existing version check
The existing version_checker.py::should_check_version(command_name) function returns True for user-facing commands and False for internal / utility commands. The notifier reuses this gate — it is not a parallel decision point.
The notifier is separate from format_version_error(). The existing hard-mismatch error path is unchanged; the notifier handles only the "no upgrade available" and "already current" cases that the existing path does not address.
Security considerations (per secure-design-checklist)
- Least Privilege: probe is a GET to a public endpoint, no auth, no PII.
- Fail-Safe Defaults: probe failure resolves to
UNKNOWN, no notice. Default behavior is "no information emitted" rather than "user blocked". - Complete Mediation: opt-out env var is checked on every invocation; not cached.
- Economy of Mechanism: two small contained modules; no new dependencies; no parallel gate to maintain.
- Open Design: source-readable; cache path is documented.
- Separation of Privilege: the existing hard-mismatch error path is unaffected; the notifier is purely additive.
- Least Common Mechanism: cache is per-user, not shared across projects.
- Psychological Acceptability: the notice is one dim line; the opt-out env var is documented in
--helptext. - Defense in Depth: timeout + 2 s ceiling +
try/exceptswallow at the call site mean a network anomaly cannot escape into the user's CLI invocation.
Data classification: probe sends User-Agent (Public) and installed version (Public via the same User-Agent). Receives latest version metadata (Public). No PII; no encryption-at-rest needed.
Testing contract
Per function-over-form-testing:
- Probe tests use
requests_mockto stub PyPI responses for each channel; assert on the resultingUpgradeProbeResult.channel. - Cache tests use
freezegunto advance time; assert on cache freshness boundary, TTL behavior, and stale-install invalidation. - Notifier tests use a captured
Console; assert on the emitted message substring (stable text) rather than the full Rich-rendered output. - No tests on
requestsitself — mock at the network boundary. - Performance tests assert wall-clock budget on the cache-warm path via
time.perf_counter.
Acceptance mapping
| Spec AC | Coverage in this contract |
|---|---|
| AC #1 — feedback when no upgrade is available | ALREADY_CURRENT and NO_UPGRADE_PATH notices |
| AC #2 — explains installed version + why no upgrade | Notice message templates carry both |
| AC #3 — distinguishes "already latest" vs "no upgrade path" | Two distinct channels, two distinct messages |
| AC #4 — not noisy; cache / rate-limit / suppress identical | 24 h cache + identical-channel suppression |
| AC #5 — hard CLI/project mismatch path unchanged | Notifier is separate from format_version_error() |