Implementation Plan: Event sync — preserve local events & track per-target drains
Branch: mission/event-sync-retention-delivery | Date: 2026-06-25 | Spec: spec.md Input: Feature specification from /kitty-specs/event-sync-retention-delivery-01KVYWRG/spec.md
Summary
Split the CLI's local sync queue into two concepts that today are conflated in one deletable row: an append-only event journal (payloads, repo/install/session-local; producer identity attached when known) and per-target delivery state (a ledger keyed by event × target). First settle #2146 target authority so config/env/auth/network/status resolve one target model; then successful upload becomes a ledger update, never event destruction, so retained events can be re-drained to a fresh delivery target. An EventSyncConfig policy layer selects retention (on/off) × delivery (none / Teamspace / external-receiver); a localhost stub receiver gets fork CI off the teamspace_key dependency. Modeled as separate core domains (event_journal/ + delivery/) per Stijn's hard requirement, with migration off the current server|user|team-scoped queue.
Technical Context
Language/Version: Python 3.11+ Primary Dependencies: stdlib sqlite3 (local store), typer/rich (CLI surface), requests (delivery transport), spec_kitty_events.* (event contracts — consumed via public imports only, per the Shared Package Boundary) Storage: local SQLite under the spec-kitty home (the sync DB today scoped via build_queue_scope, src/specify_cli/sync/queue.py:391); new shape = target-authority diagnostics + journal + delivery_targets + delivery ledger tables, journal repo/install/session-local and producer-scoped when user/team is known, not server-scoped Testing: pytest (PWHEADLESS=1 pytest tests/ -n auto --dist loadfile); assert observable CLI output + on-disk/ledger state, not call order; a stub receiver replaces any real Teamspace dependency; real-port/daemon tests run serially (-n0) Target Platform: developer workstations + CI (Linux/macOS/Windows); offline-capable (journal works with no network) Project Type: single (CLI library — src/specify_cli/) Performance Goals: a sync now over a typical journal (hundreds–low thousands of events) completes without noticeable lag; no full-table rewrite per sync (selection is index-assisted on delivery state) Constraints: append-only journal; non-destructive sync; canonical target authority before migration/drain; separate core domain (no new event-delivery logic inside queue.py); single active delivery target for MVP (ledger shaped to grow into fan-out); event IDs unchanged Scale/Scope: per-repo journals up to ~10k+ events before explicit GC; a small number of delivery targets per operator
Charter Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
- Terminology Canon — "Mission" not "feature"; no
feature*aliases in new flags/commands/fields. PASS (design uses event/journal/target/delivery/ledger vocabulary). - Separate-domain / anti-spaghetti (Stijn, C-001) — new logic lands in
event_journal/+delivery/, notqueue.pyand not existingsrc/specify_cli/events/. PASS by construction. - Complexity ceiling 15 (ruff C901 / Sonar S3776) — dispatcher selection + migration are the at-risk functions; keep each ≤15 by extracting select/post/record phases. PASS with discipline.
- New branch/helper ⇒ tests in same PR — every selection/state/migration branch gets focused tests (NFR-001/002). PASS.
- No-direct-push / PR flow — planning artifacts committed to the mission branch, merged via PR. PASS.
No violations requiring Complexity Tracking.
Project Structure
Documentation (this mission)
kitty-specs/event-sync-retention-delivery-01KVYWRG/
├── spec.md # complete
├── plan.md # this file
├── contracts/
│ └── event-sync-delivery-contract.md
└── tasks/ # WP breakdown (/spec-kitty.tasks output)
Source Code (repository root)
src/specify_cli/
├── event_journal/ # NEW domain — the journal
│ ├── journal.py # append-only payload store (producer-scoped)
│ └── models.py # Event record + coalesce rules
├── delivery/ # NEW domain — targets, ledger, dispatch
│ ├── targets.py # Delivery Target Registry (URL+scope identity; deployment metadata)
│ ├── ledger.py # per-event/per-target delivery state
│ ├── dispatcher.py # select-undelivered → post → record; never deletes
│ ├── receivers.py # Teamspace + external-receiver target types + stub
│ └── config.py # EventSyncConfig (retention × delivery presets)
├── sync/
│ ├── target_authority.py # NEW — canonical runtime target resolver/diagnostics (#2146)
│ ├── queue.py # KEEPS body_upload_queue/body_upload_failure_log (setup-plan/
│ │ # dossier sync — NOT event queueing); only EVENT queueing moves out
│ └── migrate_journal.py # NEW — discover queue-<digest>.db scoped DBs → journal + ledger backfill
└── cli/commands/sync* # sync now / server / status / gc / archive wiring
tests/
├── sync/ # dispatcher, migration, status/gc (incl. serial daemon cases)
├── event_journal/ + delivery/ # journal, ledger, targets, config, receivers, stub
└── ... # observable-state + on-disk assertions
Structure Decision: Single-project CLI library. The two new domains (event_journal/, delivery/) carry all new event-delivery logic. The existing src/specify_cli/events/ package remains the event-log integration/decision-log surface and is not repurposed. sync/queue.py is not retired: it continues to own body_upload_queue / body_upload_failure_log (setup-plan / dossier body uploads, which are not event queueing — C-006). Only the event-queueing responsibility moves out, so existing sync CLI commands keep working with new semantics and non-event uploads are untouched.
Implementation Concern Map
> Concerns are NOT work packages. /spec-kitty.tasks translates these into WPs.
IC-00 — Canonical sync target authority (#2146)
- Purpose: Resolve the split-brain between
config.toml,SPEC_KITTY_SAAS_URL, auth/readiness, WebSocket, tracker, queue scope, diagnostics, and network calls before journal migration or delivery logic uses a target. Queue scope remains a derived isolation key, not a selector. - Relevant requirements: FR-016, FR-019, C-002, C-007, SC-008.
- Affected surfaces:
src/specify_cli/sync/target_authority.py,sync/config.py,auth/config.py,sync/queue.py,sync/preflight.py,sync/owner.py,sync/runtime.py,sync/tracker_client_glue.py,saas/readiness.py,cli/commands/sync.py. - Sequencing/depends-on: none; this must be first because all journal/ledger target identity and migration backfill depend on it.
- Risks: leaving env/config behavior ambiguous would let the new ledger record delivery for one target while network calls hit another. Tests must cover env/config disagreement, stale
active_queue_scope, andsync status --check --jsondiagnostics.
IC-01 — Core domain scaffolding & boundaries
- Purpose: Stand up
event_journal/anddelivery/as clean domains with explicit interfaces so nothing leaks back intoqueue.pyor the existingevents/package (the anti-spaghetti requirement). - Relevant requirements: C-001.
- Affected surfaces:
src/specify_cli/event_journal/,src/specify_cli/delivery/. - Sequencing/depends-on: IC-00.
- Risks: getting the seam wrong forces later churn; define the journal/ledger/dispatch interfaces before filling them.
IC-02 — Event Journal (append-only, producer-scoped)
- Purpose: Durable local payload store that does not know delivery state; repo/install/session-local and producer-scoped when identity is known, not server-scoped and not blocked by auth/team/sync/network gates. Append-only with NO coalescing at this stage — coalescing is deliberately deferred to IC-02a because it needs the ledger to answer "delivered anywhere?".
- Relevant requirements: FR-001, FR-003, FR-017, C-008, SC-009.
- Affected surfaces:
event_journal/journal.py,event_journal/models.py, current event producers that now gate before queueing. - Sequencing/depends-on: IC-01.
- Risks: must not re-introduce the in-place mutation that today's
_try_coalescedoes; until IC-02a lands, every produced event is a distinct row. Teamspace-bound producers must not treatSPEC_KITTY_ENABLE_SAAS_SYNC=0, missing auth/team, or Private-Teamspace rejection as "do not write".
IC-02a — Coalescing with delivered-event immutability (deferred until the ledger exists)
- Purpose: Re-introduce coalescing safely — collapse only events with no terminal delivery to any target; once an event has been delivered anywhere it is immutable and a new event is a new row (mark the prior superseded).
- Relevant requirements: FR-011.
- Affected surfaces:
event_journal/journal.py(coalesce path),delivery/ledger.py(delivered-anywhere query). - Sequencing/depends-on: IC-02, IC-04 (ledger must exist first).
- Risks: the correctness trap from the review — delivered-event immutability is a required DB test (NFR-002), not prose. A coalesce attempt against a delivered event must leave it byte-for-byte unchanged.
IC-03 — Delivery Target Registry & identity
- Purpose: Canonical-URL + user/team scope identity derived from IC-00 target authority; record (not key on) deployment metadata; detect target reset when metadata exists.
- Relevant requirements: FR-002, C-002, FR-012, FR-016.
- Affected surfaces:
delivery/targets.py. - Sequencing/depends-on: IC-00, IC-01.
- Risks: deployment_id churn must not fork identity; reset-detection is advisory/follow-on, not automatic re-drain and not an MVP blocker.
IC-04 — Delivery Ledger
- Purpose: Per-event/per-target state answering "delivered to Y, when, result?"; shaped to grow into many-targets without schema break.
- Relevant requirements: FR-002, FR-004, C-003.
- Affected surfaces:
delivery/ledger.py. - Sequencing/depends-on: IC-01, IC-03.
- Risks: index design drives dispatcher selection performance.
IC-04a — DeliveryReceiver contract (Teamspace / external / stub)
- Purpose: Define one explicit
DeliveryReceivercontract before dispatcher implementation: endpoint URL, auth/headers, per-event result mapping (success/duplicate/pending/rejected/terminal-failed/transient), retry semantics, and which gates apply. The current Teamspace batch path is SaaS-gated + Private-Teamspace-gated + Bearer-auth + fixed to/api/v1/events/batch/(sync/batch.py) — that becomes the Teamspace receiver;externalis operator-supplied URL/auth with no Teamspace gating;stubis a localhost receiver with no credentials. - Relevant requirements: FR-007, FR-008, FR-014, SC-005, SC-007.
- Affected surfaces:
delivery/receivers.py,delivery/dispatcher.py(consumes the contract), test fixtures,contracts/batch-api-contract.md. - Sequencing/depends-on: IC-00, IC-03, IC-04; schedule before IC-05 dispatcher wiring.
- Risks: the stub must be a real target type implementing the same contract, not a test-only fork of the dispatch path. Gates must be expressed per receiver, not hard-coded in the dispatcher.
IC-05 — Sync Dispatcher
- Purpose: Select journal events lacking terminal delivery for the active target, post, update the ledger; never delete source events. Outcome mapping (today
success/duplicate/failed_permanentall DELETE,queue.py:1693):success/duplicate→ terminal-success ledger rows;pending/rejected/failed_transientkeep their current semantics (already aligned,queue.py:1666-1678) as ledger state, not deletes.failed_permanentis NOT a delete and NOT a success — see IC-05a. - Relevant requirements: FR-001, FR-004, FR-005.
- Affected surfaces:
delivery/dispatcher.py,cli/commands/sync*. - Sequencing/depends-on: IC-02, IC-04, IC-04a.
- Risks: complexity ceiling — split select/post/record phases.
IC-05a — Terminal-failed state machine (failed_permanent)
- Purpose: Decide what a permanent failure (e.g. oversized event) means once events are never deleted. Resolution: a terminal-failed ledger state that is selector-excluded from future drains (so the drain still progresses, as the old DELETE achieved) and stays inspectable; the payload is retained, never deleted. Re-attempt only via explicit operator action.
- Relevant requirements: FR-015.
- Affected surfaces:
delivery/ledger.py(state),delivery/dispatcher.py(selection excludes terminal-failed). - Sequencing/depends-on: IC-04, IC-05.
- Risks: forgetting to exclude terminal-failed from the selector would loop the drain on an oversized event. Tests for oversized events are required.
IC-06 — EventSyncConfig policy & modes
- Purpose: Operator/repository dial selecting retention × delivery; four presets (TEAMSPACE / EXTERNAL_RECEIVER / LOCAL_RETENTION / OPT_OUT).
- Relevant requirements: FR-006, FR-007.
- Affected surfaces:
delivery/config.py,cli/commands/sync*. - Sequencing/depends-on: IC-00, IC-05.
- Risks: clear mode semantics: LOCAL_RETENTION journals without delivery; OPT_OUT only drops local-only/discardable families and must refuse or audit Teamspace-bound discard attempts.
IC-07 — Migration off the hash-scoped queues
- Purpose: Migrate currently-queued events out of the existing
queue-<digest>.dbscoped DBs into the journal + ledger. The digest is a one-way hash ofserver|user|team, so URL/scope cannot be recovered from the filename — the migration must therefore: (a) discover all scoped DBs (glob the queue dir, not just the legacyqueue.dbthe current migration handles), (b) attach migrated events to a best-effort or explicitly-unknowndelivery target rather than fabricating identity, (c) apply duplicate collision policy, (d) skip/handle an unrecognized digest without aborting, and (e) be explicit that delivered-and-deleted events are unrecoverable (only currently-queued payloads survive). - Duplicate policy: same
event_id+ identical canonical payload imports once and records all source DB provenance; sameevent_id+ divergent canonical payload writes a migration-conflict/audit row, leaves the source DB untouched, blocks cleanup, and exits non-zero/blocked until operator resolution. It never rewrites event IDs and never silently overwrites payload bytes. - Relevant requirements: FR-013, FR-018, NFR-005, SC-006, SC-011.
- Affected surfaces:
sync/migrate_journal.py, existingqueue.pyscope helpers (build_queue_scope, the digest path builder). - Sequencing/depends-on: IC-00, IC-02, IC-04.
- Risks: atomicity per DB; idempotent re-run; plural-source tests (multiple DBs, unknown scope, duplicate
event_id) are required by SC-006 — a single-DB happy path is insufficient.
IC-08 — sync status / gc / archive
- Purpose: Report retained vs per-target delivered counts; explicit-only destructive cleanup preserving ledger history; surface journal growth; extend the existing
sync status --check --jsoncontract with additive target-authority/journal/delivery/migration/body-upload fields. - Relevant requirements: FR-009, FR-010, FR-019, NFR-004.
- Affected surfaces:
cli/commands/sync*,delivery/ledger.py,event_journal/journal.py,kitty-specs/mvp-cli-sync-boundary-completion-01KRX11M/contracts/sync-status-output.md. - Sequencing/depends-on: IC-05.
- Risks: status back-compat — existing stats consumers must keep working with clarified semantics.
IC-09 — (Sequenced follow-on) SaaS /health deployment metadata
- Purpose: Consume
server_instance_id/deployment_id/environment_name/git_shafrom/api/v1/sync/health/to strengthen target provenance + reset-detection. - Relevant requirements: C-004.
- Affected surfaces:
delivery/targets.py(CLI side); a separatespec-kitty-saaschange exposes the metadata. - Sequencing/depends-on: IC-03; gated on the SaaS change — ship CLI with URL-only identity first. Likely a separate follow-on mission, not in this MVP's critical path.
- Risks: cross-repo coordination; don't let it block IC-01–IC-09.