Spec Kitty

└─ kitty-specs
   └─ Local Mission Dossier Authority & Parity Export

Mission Run:

📚 Docs ↗

feature_slug: 042-local-mission-dossier-authority-parity-export feature_title: Local Mission Dossier Authority & Parity Export total_subtasks: 58 total_work_packages: 10

Work Packages: Local Mission Dossier Authority & Parity Export

Summary

Feature 042 implements a mission artifact dossier system that indexes all spec-kitty feature artifacts, computes deterministic content hashes and parity signatures, and emits canonical dossier events to sync infrastructure. The system enables local parity-drift detection (offline) while exporting events for SaaS backend integration.

Phases:

Phase 1 (WP01-WP05): Core models, manifests, indexing, events, snapshots
Phase 2 (WP06-WP08): Dashboard API, UI integration, parity detection
Phase 3 (WP09-WP10): Testing, edge cases, integration

Total Effort: ~58 subtasks across 10 work packages | Delivery Mode: API + local dashboard UI parity slice

Phase 1: Core Projection & Events (WP01-WP05)

WP01: ArtifactRef Model & Deterministic Hashing

Goal: Define immutable artifact reference model and deterministic SHA256 hashing utilities

Priority: P1 (Foundation for all downstream WPs)

Status: planned

Subtasks: T001-T005 (5 subtasks)

✅ T001: Create ArtifactRef pydantic model (identity, location, content hash, provenance, state fields)
✅ T002: Implement SHA256 deterministic hashing utility (file → hash)
✅ T003: Build Hasher class with order-independent parity (sorted hashes, combined hash)
✅ T004: UTF-8 validation & error handling (explicit failure, no silent corruption)
✅ T005: Unit tests for hashing determinism (same file → same hash, reproducibility)

Implementation Notes:

ArtifactRef: ~25 fields covering identity, location, metadata, provenance, state (see data-model.md for full spec)
Hasher: Standalone utility class, no dependencies beyond hashlib (stdlib)
Order-independence: Sort artifact hashes lexicographically before concatenation
UTF-8: Read file as bytes, hash bytes directly; if invalid UTF-8 detected, emit error_reason and continue

Test Criteria:

ArtifactRef validates all required fields, rejects malformed artifact_key
Same file hashed 10x produces identical SHA256
Different files produce different hashes
Parity hash unchanged by artifact order

Dependencies: None

Estimated Scope: ~300 lines (models.py + hasher.py + tests)

WP02: Expected Artifact Manifests

Goal: Define manifest schema and load v1 manifests for software-dev, research, documentation missions

Priority: P1 (Foundation for completeness checking)

Status: planned

Subtasks: T006-T011 (6 subtasks)

✅ T006: Design manifest schema (required_always, required_by_step, optional_always)
✅ T007: Implement ManifestRegistry loader (reads mission.yaml states, loads YAML manifests)
✅ T008: Create software-dev expected-artifacts.yaml (step-aware, using existing mission artifacts)
✅ T009: Create research expected-artifacts.yaml (step-aware)
✅ T010: Create documentation expected-artifacts.yaml (step-aware)
✅ T011: Manifest validation (verify steps exist in mission.yaml, path patterns valid)

Implementation Notes:

Manifest schema: YAML per mission, with schema_version, mission_type, manifest_version fields
Step-aware: required_by_step keys are state IDs from mission.yaml (e.g., "planning", "implementation")
V1 scope: Only software-dev, research, documentation; other missions degrade gracefully (index only, no completeness check)
ManifestRegistry: Singleton loader, cache manifests in memory
Path patterns: Support wildcards (e.g., "tasks/*.md")

Test Criteria:

Manifest loads from YAML without errors
Registry correctly maps step names to artifact lists
Unknown missions handled gracefully (return None, no exception)

Dependencies: WP01 (ArtifactRef model context)

Estimated Scope: ~400 lines (manifest.py + 3 YAML files + tests)

WP03: Indexing & Missing Detection

Goal: Implement artifact scanner that indexes feature directory, classifies artifacts, detects missing required artifacts

Priority: P1 (Core dossier functionality)

Status: planned

Subtasks: T012-T017 (6 subtasks)

✅ T012: Implement Indexer.index_feature() scanning logic (walk feature directory)
✅ T013: Artifact classification (derive from filename patterns, manifest definitions)
✅ T014: Missing artifact detection (required vs optional, per manifest)
✅ T015: Unreadable artifact handling (permission denied, encoding errors, deleted mid-scan)
✅ T016: MissionDossier builder from indexed artifacts (create complete inventory)
✅ T017: Step-aware completeness checking (given mission step, which artifacts required?)

Implementation Notes:

Indexer: Recursive directory scan, yield artifacts as discovered
Classification: 6 deterministic classes (input, workflow, output, evidence, policy, runtime)
Missing detection: Compare filesystem scan against manifest requirements; record reason_code (not_found, unreadable, invalid_format)
Error handling: Catch permission errors, UTF-8 decode failures; emit error_reason, continue scan (no silent failures)
MissionDossier: Aggregates artifacts, links to manifest, computes completeness_status

Test Criteria:

Scans 30+ artifacts without errors
Correctly identifies required vs optional
Missing required artifacts flagged with reason codes
Optional artifacts ignored if absent (no missing events)

Dependencies: WP01 (ArtifactRef), WP02 (manifest)

Estimated Scope: ~400 lines (indexer.py + tests)

WP04: Dossier Event Types & Emission

Goal: Define 4 canonical dossier event schemas and integrate emission with sync infrastructure

Priority: P1 (Required for SaaS integration)

Status: planned

Subtasks: T018-T022 (5 subtasks)

✅ T018: Define 4 dossier event schemas (ArtifactIndexed, ArtifactMissing, SnapshotComputed, ParityDriftDetected)
✅ T019: Add schemas to spec-kitty-events contracts (JSON Schema)
✅ T020: Event emitters (async, integrate with OfflineQueue)
✅ T021: Payload validation (Pydantic models, fail on invalid data)
✅ T022: Event routing to sync infrastructure (enqueue, webhook dispatch)

Implementation Notes:

Event types: See data-model.md for complete payload schemas
MissionDossierArtifactIndexed: Emitted once per artifact indexed
MissionDossierArtifactMissing: Emitted only if required artifact missing
MissionDossierSnapshotComputed: Emitted after all artifacts indexed
MissionDossierParityDriftDetected: Emitted only if parity hash differs from baseline
Emission: Via spec_kitty_events.OfflineQueue.emit(), respects offline mode
Validation: Pydantic BaseModel for each payload; raise on schema violation

Test Criteria:

All 4 event types emit with valid schemas
Events enqueue to offline queue (no SaaS call during local scan)
Conditional events (missing, drift) emit only when conditions met
Envelope metadata (event_id, timestamp, actor) auto-populated

Dependencies: WP03 (indexing produces artifacts), spec_kitty_events (existing)

Estimated Scope: ~350 lines (events.py + JSON schemas + tests)

WP05: Snapshot Computation & Parity Hash

Goal: Compute deterministic snapshots from dossier state and compute reproducible parity hashes

Priority: P1 (Core for determinism requirement)

Status: planned

Subtasks: T023-T027 (5 subtasks)

✅ T023: Deterministic snapshot computation (sort artifacts, count by status)
✅ T024: Parity hash algorithm (sorted artifact hashes, combined hash)
✅ T025: Snapshot persistence (JSON storage to .kittify/dossiers/{feature_slug}/snapshot-latest.json)
✅ T026: Snapshot validation (reproducibility: same content → same hash)
✅ T027: Snapshot equality comparison (parity_hash_sha256 as source of truth)

Implementation Notes:

Snapshot: Point-in-time projection with artifact counts, completeness status, parity hash
Parity hash: SHA256 of concatenated sorted artifact hashes (order-independent)
Persistence: Store as JSON, include timestamp and artifact summaries for audit
Reproducibility: Identical artifact content (on any machine, any timezone) produces identical parity hash
SC-002 validation: Run twice on unchanged content, verify parity_hash_sha256 identical

Test Criteria:

Snapshot computes without errors for 30+ artifacts
Parity hash deterministic (same content → same hash, multiple runs)
Snapshot reproducible on different machines/timezones (if artifact content identical)
Completeness status correctly reflects required artifacts present/missing

Dependencies: WP01 (ArtifactRef, hasher), WP03 (MissionDossier)

Estimated Scope: ~350 lines (store.py + snapshot logic + tests)

Phase 2: API & Dashboard Integration (WP06-WP08)

WP06: Dashboard API Endpoints

Goal: Expose 4 REST endpoints for dossier access, implement adapter pattern for future FastAPI migration

Priority: P1 (Enables dashboard UI)

Status: planned

Subtasks: T028-T033 (6 subtasks)

✅ T028: Implement GET /api/dossier/overview endpoint (returns DossierOverviewResponse)
✅ T029: Implement GET /api/dossier/artifacts endpoint (list, filtering by class/wp_id/step_id)
✅ T030: Implement GET /api/dossier/artifacts/{artifact_key} endpoint (detail, full text)
✅ T031: Implement GET /api/dossier/snapshots/export endpoint (returns snapshot JSON for SaaS import)
✅ T032: Router dispatch rules (add dossier routes to dashboard handler)
✅ T033: Define adapter interface for future FastAPI migration (handler protocol spec)

Implementation Notes:

Overview: Returns completeness_status, parity_hash, artifact counts, last_scanned_at
Artifacts list: Supports filtering by class, wp_id, step_id; stable ordering (by artifact_key)
Detail: Returns full text content (if <5MB), media_type_hint (markdown, json, yaml), truncation notice if >5MB
Export: Returns snapshot JSON (importable by SaaS backend)
Handlers: Integrate with existing dashboard handler pattern (HTTPServer + dispatch)
Adapter interface: Define protocol/interface for handler methods (enables future FastAPI port)

Test Criteria:

All endpoints return valid JSON with correct schema
Filtering works (class=output returns only output artifacts)
Detail endpoint truncates large files (>5MB) gracefully
Export returns snapshot JSON compatible with SaaS import

Dependencies: WP05 (snapshot), WP04 (event models)

Estimated Scope: ~400 lines (api.py handlers + adapter interface + tests)

WP07: Dashboard UI Integration

Goal: Render dossier overview, artifact list, filtering, and detail views in local dashboard

Priority: P1 (User-facing feature)

Status: planned

Subtasks: T034-T039 (6 subtasks)

✅ T034: Create dossier-panel.js (vanilla JS, fetch wrappers, no Vue framework)
✅ T035: Add dossier tab to dashboard HTML (tabs: overview, artifacts, detail)
✅ T036: Render artifact list with filtering UI (checkboxes for class, wp_id, step_id)
✅ T037: Implement artifact detail view (full-text display, syntax highlighting hints)
✅ T038: Add truncation notice for large artifacts (>5MB, with download link if needed)
✅ T039: Media type hints (markdown badge, json icon, yaml label)

Implementation Notes:

Vanilla JS: No Vue/SPA framework; fetch API + DOM manipulation
Panel: Integrated as new tab in existing dashboard
Filtering: Client-side filtering (load full list, filter locally) or server-side (add query params to /api/dossier/artifacts)
Detail view: Modal or side panel showing full artifact content
Truncation: If size_bytes > 5242880 (5MB), show notice + link to raw file
Syntax hints: Add CSS classes (markdown-content, json-content) for browser-native highlighting

Test Criteria:

Dashboard renders dossier panel without errors
Artifact list loads and displays 30+ artifacts
Filtering works (click class=output, list updates)
Detail view displays full text for small artifacts, truncation notice for large

Dependencies: WP06 (API endpoints)

Estimated Scope: ~300 lines (dossier-panel.js + HTML/CSS + tests)

WP08: Local Parity-Drift Detector

Goal: Implement local baseline management and parity drift detection, emit drift events

Priority: P1 (Core parity feature, offline-capable)

Status: planned

Subtasks: T040-T045 (6 subtasks)

✅ T040: Baseline key computation (identity tuple: project_uuid, node_id, feature_slug, target_branch, mission_key, manifest_version)
✅ T041: Baseline persistence (JSON file: .kittify/dossiers/{feature_slug}/parity-baseline.json)
✅ T042: Baseline acceptance logic (key match validation, prevent false positives)
✅ T043: Drift detection (compare current parity_hash_sha256 vs cached baseline)
✅ T044: ParityDriftDetected event emission (severity, missing_in_local, missing_in_saas)
✅ T045: Baseline update logic (capture new baseline when accepted)

Implementation Notes:

Baseline key: Fully namespaced to prevent false positives (branch switches, manifest updates, multi-user/machine)
Key components: project_uuid, node_id (from sync/project_identity.py), feature_slug, target_branch, mission_key, manifest_version
Acceptance: Compare current key hash vs baseline key hash; accept only if match (else treat as "no baseline")
Drift detection: Works offline, compares hashes only (no SaaS call)
Event: Emitted only if baseline accepted AND hash differs; includes severity (info/warning/error)

Test Criteria:

Baseline computes and persists without errors
Acceptance correctly validates key match
Drift detection identifies hash changes
False positives prevented (branch switch triggers "no baseline", not drift event)
Event emitted only on true drift (hash change with matching baseline)

Dependencies: WP05 (snapshot/parity hash), WP04 (event emission), sync/project_identity (existing)

Estimated Scope: ~350 lines (drift_detector.py + tests)

Phase 3: Testing & Hardening (WP09-WP10)

WP09: Determinism Test Suite

Goal: Comprehensive tests for hash reproducibility, ordering stability, encoding robustness

Priority: P1 (Critical for SC-002, SC-006, SC-007 success criteria)

Status: planned

Subtasks: T046-T050 (5 subtasks)

✅ T046: Hash reproducibility (same file → same hash, tested 10 runs)
✅ T047: Order independence (artifact order irrelevant to parity hash)
✅ T048: UTF-8 handling (special chars, BOM, CJK, surrogates)
✅ T049: CRLF vs LF consistency (Windows/Unix line endings)
✅ T050: Parity hash stability (machines, timezones, different Python versions)

Implementation Notes:

Reproducibility: Hash same file multiple times, verify identical SHA256
Order independence: Create dossier with artifacts in random order, verify parity hash unchanged
UTF-8: Test BOM-prefixed files, CJK characters (Chinese/Japanese), UTF-8 surrogates, invalid sequences (should fail explicitly)
CRLF: Test files with Windows (CRLF) and Unix (LF) line endings; hash should be order-independent
Cross-machine: If possible, run same test on different machines (Linux/macOS/Windows), verify hash identical

Test Criteria:

All SC-002, SC-006, SC-007 success criteria passed
Zero hash mismatches across 10+ runs, multiple machines
UTF-8 edge cases handled explicitly (error_reason, no corruption)
Line-ending differences don't cause hash mismatches

Dependencies: WP01 (hasher), WP05 (snapshot)

Estimated Scope: ~300 lines (test_determinism.py)

WP10: Integration & Edge Cases

Goal: Integration tests covering full scan workflow, encoding errors, large artifacts, edge cases

Priority: P1 (Hardening, final validation)

Status: planned

Subtasks: T051-T058 (8 subtasks)

✅ T051: Missing required artifact detection (edge case: multiple missing, blocking)
✅ T052: Optional artifact handling (edge case: optional missing, non-blocking)
✅ T053: Unreadable artifact handling (permission denied, disk I/O errors)
✅ T054: Large artifact handling (>5MB, dashboard truncation, no memory issues)
✅ T055: Full scan workflow integration test (specify → plan → tasks → index → snapshot → events)
✅ T056: SaaS webhook simulator (mock endpoint receives all 4 event types)
✅ T057: Concurrent file modification edge case (file changed during scan)
✅ T058: Manifest version mismatch edge case (manifest updated, baseline key mismatch)

Implementation Notes:

Missing detection: Create feature with spec.md missing, plan.md present; verify MissionDossierArtifactMissing event
Optional handling: Create feature with research.md (optional) missing; verify no missing event
Unreadable: Create artifact with no read permissions; verify unreadable error_reason, continue scan
Large artifacts: Create 10MB artifact; verify API truncates, shows notice, no memory explosion
Full workflow: Create feature with all artifacts, run scan, verify all events emitted in order
Webhook simulator: Mock HTTP endpoint that accepts POST /webhook/dossier, logs all 4 event types
Concurrent modification: Modify file mid-scan (if possible); verify hash captures pre-modification state
Manifest mismatch: Change manifest_version, re-scan; verify baseline rejected (no drift event, informational instead)

Test Criteria:

All edge cases handled gracefully (no silent failures, no crashes)
SaaS webhook simulator receives all expected events
Large artifacts processed without memory issues
Concurrent modification handled (scan completes, hash captures point-in-time state)
Manifest version changes prevent false positive drift events

Dependencies: All WP01-WP09

Estimated Scope: ~450 lines (test_integration.py, test_encoding.py, fixtures)

Dependency Graph

WP01 (ArtifactRef, Hasher)
  ├─→ WP02 (Manifests)
  ├─→ WP03 (Indexing)
  ├─→ WP05 (Snapshot, Parity)
  └─→ WP09 (Determinism Tests)

WP02 (Manifests)
  └─→ WP03 (Indexing)

WP03 (Indexing)
  ├─→ WP04 (Events)
  └─→ WP05 (Snapshot)

WP04 (Events)
  ├─→ WP06 (API Endpoints)
  └─→ WP08 (Drift Detection)

WP05 (Snapshot, Parity)
  ├─→ WP06 (API Endpoints)
  ├─→ WP08 (Drift Detection)
  └─→ WP09 (Determinism Tests)

WP06 (API Endpoints)
  ├─→ WP07 (Dashboard UI)
  └─→ WP10 (Integration Tests)

WP07 (Dashboard UI)
  └─→ WP10 (Integration Tests)

WP08 (Drift Detection)
  └─→ WP10 (Integration Tests)

WP09 (Determinism Tests)
  └─→ WP10 (Integration Tests)

Critical Path: WP01 → WP02 → WP03 → (WP04 + WP05) → WP06 → WP07 → WP10

Parallelizable Groups:

Phase 1: WP01, then WP02, then WP03, then WP04 + WP05 in parallel
Phase 2: WP06 (after WP04 + WP05), then WP07 (after WP06), and WP08 in parallel (after WP04 + WP05)
Phase 3: WP09 (after WP01 + WP05), WP10 (after WP06 + WP07 + WP08 + WP09)

Success Criteria Summary

Success Criteria	Coverage
SC-001 (API <500ms)	WP06, WP10 testing
SC-002 (Deterministic snapshots)	WP05, WP09
SC-003 (Event emission)	WP04, WP10
SC-004 (Missing detection)	WP03, WP10
SC-005 (Scale to 1000 artifacts)	WP10
SC-006 (Parity reproducible)	WP05, WP09
SC-007 (UTF-8 robustness)	WP09, WP10

All work packages collectively deliver complete feature 042 specification.

Notes

Dependencies finalized: WP frontmatter dependency lists are now aligned with this graph for deterministic implement validation.
Test-driven: Each WP includes unit tests + integration tests (WP10 comprehensive)
Scope containment: No FastAPI migration, no manifest versioning, no cross-org federation (deferred to post-042)
Quality bar: Zero silent failures, explicit error handling, deterministic hashing, full spec compliance

Canonical Status (Generated)

WP01: done
WP02: done
WP03: done
WP04: done
WP05: done
WP06: done
WP07: done
WP08: done
WP09: done
WP10: done