Implementation Plan: Phase 4 Auth Identity-Boundary Canary Gate
Branch: main | Date: 2026-05-20 | Spec: spec.md Input: Feature specification from kitty-specs/phase4-canary-gate-01KS1W46/spec.md Mission: phase4-canary-gate-01KS1W46 (01KS1W46ZAR9S9RJPQQJAMCV6P)
Summary
This gate verifies that both Phase-4 blockers (#1141: OfflineQueue.queue_event silent replacement; #1182: sync now unknown-error misclassification) are closed with substantive, test-backed fixes. It then installs the post-rc15 CLI RC, runs the four-scenario auth identity-boundary canary to 4/4 consecutive passes, attaches evidence to e2e#41 and closes it, runs the Teamspace MVP canary suite four times, and posts the evidence comment to spec-kitty#1038. The entire gate is delivered as a sequence of 8 work packages executed in the spec-kitty-end-to-end-testing and spec-kitty repos inside the prepared workspace.
Technical Context
Language/Version: Python 3.11+ (existing spec-kitty-cli and e2e testing stack); Bash (canary runner script) Primary Dependencies: spec-kitty-cli (pipx-managed), uv (env management), gh CLI (GitHub issue/PR operations), pytest (canary test execution), flyctl (SaaS preflight), curl (health endpoints) Storage: Filesystem — JSON evidence files at spec-kitty-end-to-end-testing/artifacts/sync_identity_boundary/, log files at /tmp/teamspace-canary-run-.log Testing: Targeted: spec-kitty-end-to-end-testing/tests/identity_boundary/ (sync_identity_boundary_deployed_dev marker); tests/identity_boundary__unit_test.py for harness preflight; test_go_live_pre_connector_saas_e2e.py, test_teamspace_pulse_deployed_dev_e2e.py, test_teamspace_sync_deployed_dev_e2e.py for Teamspace MVP suite Target Platform: macOS/Linux (trusted runner at /Users/robert/.local/) Project Type: Operational gate (verification + evidence collection; minimal code changes) Performance Goals: Each canary scenario completes within SaaS timeout bounds; sync now 5s final-sync timeout is a gate criterion, not a target to extend Constraints: No SaaS DB mutation, no local queue surgery, no ingress cap change, no final 3.2.0 cut; historical 22 business-rule rows must not be modified; SPEC_KITTY_ENABLE_SAAS_SYNC=1 required for all auth/SaaS CLI commands
Branch Contract (confirmed twice per command rules)
- Current branch at plan start:
main - Planning/base branch:
main - Final merge target:
main - branch_matches_target: true
Charter Check
Checked against .kittify/charter/charter.md.
| Criterion | Status | Notes |
|---|---|---|
| Python 3.11+ required | ✅ | Existing CLI stack |
| pytest with 90%+ coverage for new code | ✅ | WP01/WP02 include test coverage audit; new RC cut only if blockers are confirmed via tests |
| mypy --strict | ✅ | Not applicable to operational gate WPs; any RC cut must already pass mypy |
| Integration tests for CLI commands | ✅ | Canary itself IS the integration test surface |
| PyPI distribution via release workflow | ✅ | RC16 (if needed) uses existing release.yml workflow per CLAUDE.md |
No violations. Gate passes.
Project Structure
Documentation (this feature)
kitty-specs/phase4-canary-gate-01KS1W46/
├── plan.md # This file
├── research.md # Phase 0 output
├── data-model.md # Phase 1 output
├── contracts/ # Phase 1 output
│ ├── e2e-41-evidence-comment-template.md
│ └── issue-1038-evidence-comment-template.md
└── tasks.md # Phase 2 output (/spec-kitty.tasks — not created here)
Execution Repos (workspace)
spec-kitty-end-to-end-testing/
├── scripts/run-sync-identity-boundary-canary.sh # Canary runner
├── tests/identity_boundary/ # Scenario pytest tests
├── tests/identity_boundary_*_unit_test.py # Harness preflight tests
├── tests/test_go_live_pre_connector_saas_e2e.py # Teamspace MVP suite
├── tests/test_teamspace_pulse_deployed_dev_e2e.py
├── tests/test_teamspace_sync_deployed_dev_e2e.py
└── artifacts/sync_identity_boundary/
├── rc15-attempt1/ # Preserved prior evidence (read-only)
├── <new-rc>-attempt1/ # New evidence per attempt (written by gate)
├── latest.json # Symlink/copy of most recent single run
└── runs/
├── run-1.json
├── run-2.json
├── run-3.json
└── run-4.json
spec-kitty/
├── pyproject.toml # Version bump if RC16 needed
├── CHANGELOG.md # Entry if RC16 needed
└── .kittify/metadata.yaml # Version metadata if RC16 needed
Complexity Tracking
No charter violations.
Phase 0: Research
Research Tasks
1. Understand fix-substance criteria for #1141: What file(s) contain OfflineQueue.queue_event or the call site in the fire_saas_fanout → emit_wp_status_changed → OfflineQueue.queue_event chain? What does "a test that fails without the fix and passes with it" look like for this specific bug?
2. Understand RC-cut workflow: What are the exact files that need version bumps? What does release.yml require? Is there a metadata.yaml that also needs updating?
3. Canary evidence format: What is the exact JSON schema expected in run-N.json? What fields beyond "outcome" are present?
4. Teamspace MVP canary failure interpretation matrix: What does each failure mode look like and how do we triage (413, 503, materialization timeout)?
Research Findings (consolidated into research.md)
Decision 1: Fix-substance verification approach for #1141
- Rationale: The start-here.md bisect path is
fire_saas_fanout → emit_wp_status_changed → OfflineQueue.queue_event. The fix must change behavior (not just add logging). Verification: inspect the merge commit diff; if the diff only toucheslogger.*calls without modifying thequeue_eventinvocation logic or its callers, reject as diagnostic-only. - Files to check:
spec-kitty/src/specify_cli/sync/queue.py(OfflineQueue),spec-kitty/src/specify_cli/status/adapters.py(fanout),spec-kitty/tests/sync/test_sync_*.py(new/modified test).
Decision 2: RC-cut procedure (if needed)
- Per CLAUDE.md:
git tag -a v3.2.0rc16 -m "Release v3.2.0rc16 - Fix #1141 + #1182"thengit push origin v3.2.0rc16. Release is triggered by tag push viarelease.yml. Monitor viagh run list --workflow=release.yml. - Files to bump:
pyproject.toml(version = "3.2.0rc16"),CHANGELOG.md(add entry),.kittify/metadata.yaml(if version field present).
Decision 3: Evidence file schema
- From
spec-kitty-end-to-end-testing/artifacts/sync_identity_boundary/rc15-attempt1/: run-N.json structure includesoutcome(pass/fail),run_number,scenarios(list of scenario results),cli_version,timestamp. The gate checks only"outcome": "pass"at top level.
Decision 4: No RC16 cut authority
- CLAUDE.md states "NEVER create releases without explicit user instruction." The gate checks if a post-rc15 RC exists; if not, it STOPS and reports rather than cutting autonomously. WP02 reflects this stop condition.
Phase 1: Design & Contracts
WP Execution Sequence
WP01 [serial, hard gate]
└── Verify #1141 + #1182 CLOSED; audit fix substance
WP02 [depends WP01, hard gate]
└── Determine latest RC; if still rc15 → STOP; install + verify imports
WP03 [independent, can parallel with WP02]
└── SaaS preflight: /health/, /health/ready/, drain counts
WP04 [depends WP02 + WP03, hard gate]
└── Single-run canary (--single); all 4 scenarios must pass
WP05 [depends WP04]
└── Four-run canary protocol; all 4 × "outcome":"pass"
WP06 [depends WP05]
└── Bundle evidence; comment on e2e#41; close e2e#41
WP07 [depends WP06]
└── Teamspace MVP canary suite ×4; preserve logs
WP08 [depends WP07]
└── Evidence comment on spec-kitty#1038; do NOT close
Contracts
Contract: e2e#41 Evidence Comment Template
(→ contracts/e2e-41-evidence-comment-template.md)
Required fields:
- CLI:
spec-kitty-cli==<version>, tag<tag>, commit<sha> - SaaS: Fly image
<image>, git SHA<sha> /health/ready/status: 200, events<version>- Drain counts:
terminal_failed_count=0,business_rule_rejected_count=22 - Four-run result: 4/4 pass
- Evidence path:
artifacts/sync_identity_boundary/runs/run-{1..4}.json - Explicit statement: "No manual SaaS queue mutation, Fly DB edits, daemon record surgery, local queue deletion, or ingress-cap override was used."
Contract: #1038 Evidence Comment Template
(→ contracts/issue-1038-evidence-comment-template.md)
## Canary evidence - post planning#16 auth boundary hardening
- CLI: spec-kitty-cli==<version>, tag <tag>, commit <sha>.
- Events: spec-kitty-events==5.1.0 or newer, tag <tag>, commit <sha>.
- SaaS: Fly image <image>, git SHA <sha>, `/health/` 200, `/health/ready/` 200.
- Drain counts: terminal_failed_infra=<N>, terminal_failed_business_rule=22.
- Identity-boundary canary: 4/4 pass, evidence attached on e2e#41.
- Teamspace MVP canary suite: 4/4 pass.
- Evidence bundle: <path or uploaded link>.
No manual SaaS queue mutation, event replay, DB cleanup, or ingress-cap override was used.
Data Model: Canary Run Result
(→ data-model.md)
CanaryRunResult {
outcome: "pass" | "fail"
run_number: int
timestamp: ISO8601
cli_version: str
scenarios: [
{
id: int (1-4)
name: str
status: "pass" | "fail"
failure_mode: str | null
evidence_files: [str]
}
]
}
Invariants:
outcome = "pass"iff ALL scenarios havestatus = "pass"- Evidence files for each attempt are immutable post-run
run_numberis 1-indexed; the four-run protocol requires run_number ∈ {1, 2, 3, 4}
Stop Conditions (Design Constraints)
Each WP has a stop-and-report condition. The gate must NEVER proceed past a stop condition:
| WP | Stop Condition | Action |
|---|---|---|
| WP01 | Either issue still OPEN | Report which issues remain; halt |
| WP01 | #1141 diff is logging-only | Report fix is diagnostic-only; halt |
| WP02 | Latest RC is still rc15 | Report; offer to cut rc16 per operator instruction |
| WP03 | /health/ready/ ≠ 200 | Report SaaS degradation; halt |
| WP04 | Any scenario fails | Re-open issue(s); preserve evidence; halt |
| WP05 | Any run ≠ "pass" | Re-open issue(s); preserve evidence; halt |
| WP07 | Any Teamspace run fails | Root-cause before retrying; halt on stop conditions |
Branch Contract (second statement — final report)
- Current branch at plan start:
main - Intended planning/base branch:
main - Final merge target for completed changes:
main - branch_matches_target: true
Next step: Run /spec-kitty.tasks to generate the 8 work packages.