Phase 1 Data Model: Mission Dossier Entities
Date: 2026-02-21 | Feature: 042-local-mission-dossier-authority-parity-export
Dossier Entities & Relationships
ArtifactRef (WP01)
Purpose: Immutable reference to a single indexed artifact
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional
class ArtifactRef(BaseModel):
"""Single artifact indexed in a dossier."""
# Identity
artifact_key: str = Field(
...,
description="Stable, unique key for this artifact (e.g., 'input.spec.main', 'output.tasks.per_wp')"
)
artifact_class: str = Field(
...,
description="Classification: input | workflow | output | evidence | policy | runtime | other"
)
# Location & Content
relative_path: str = Field(
...,
description="Relative path from feature directory (e.g., 'spec.md', 'tasks/WP01.md')"
)
content_hash_sha256: str = Field(
...,
description="SHA256 hash of artifact bytes (deterministic)"
)
size_bytes: int = Field(
...,
description="File size in bytes"
)
# Metadata
wp_id: Optional[str] = Field(
None,
description="Work package ID if linked (e.g., 'WP01', None if not WP-specific)"
)
step_id: Optional[str] = Field(
None,
description="Mission step (e.g., 'planning', 'implementation', None if not step-specific)"
)
required_status: str = Field(
default="optional",
description="'required' | 'optional' (from manifest)"
)
# Provenance
provenance: Optional[dict] = Field(
default=None,
description="Source info: {source_kind: 'git'|'runtime'|'generated'|'manual', actor_id, captured_at}"
)
# State
is_present: bool = Field(
default=True,
description="True if file currently exists and readable"
)
error_reason: Optional[str] = Field(
None,
description="If not present: 'not_found' | 'unreadable' | 'invalid_format' | 'deleted_after_scan'"
)
# Timestamps
indexed_at: datetime = Field(
default_factory=datetime.utcnow,
description="When this artifact was indexed"
)
class Config:
json_encoders = {
datetime: lambda v: v.isoformat()
}
Uniqueness Constraint: (feature_slug, artifact_key) is unique per dossier.
ExpectedArtifactManifest (WP02)
Purpose: Registry of required/optional artifacts per mission type and step
from typing import List, Dict
from enum import Enum
class ArtifactClassEnum(str, Enum):
"""6 artifact classes (deterministic, no fallback)."""
INPUT = "input"
WORKFLOW = "workflow"
OUTPUT = "output"
EVIDENCE = "evidence"
POLICY = "policy"
RUNTIME = "runtime"
class ExpectedArtifactSpec(BaseModel):
"""Single expected artifact definition."""
artifact_key: str = Field(..., description="e.g., 'input.spec.main'")
artifact_class: ArtifactClassEnum
path_pattern: str = Field(..., description="e.g., 'spec.md' or 'tasks/*.md'")
blocking: bool = Field(default=False, description="True if missing blocks completeness")
class ExpectedArtifactManifest(BaseModel):
"""Registry for a mission type, step-aware."""
schema_version: str = "1.0"
mission_type: str = Field(..., description="e.g., 'software-dev', 'research', 'documentation'")
manifest_version: str = Field(default="1", description="Manifest version for compatibility/evolution")
required_always: List[ExpectedArtifactSpec] = Field(
default_factory=list,
description="Artifacts required regardless of workflow step"
)
required_by_step: Dict[str, List[ExpectedArtifactSpec]] = Field(
default_factory=dict,
description="Step ID (from mission.yaml states) → list of required specs for that step"
)
optional_always: List[ExpectedArtifactSpec] = Field(
default_factory=list,
description="Optional artifacts checked if present, but missing is non-blocking"
)
@classmethod
def from_yaml_file(cls, path: Path) -> "ExpectedArtifactManifest":
"""Load manifest from YAML."""
import ruamel.yaml
yaml = ruamel.yaml.YAML()
with open(path) as f:
data = yaml.load(f)
return cls(**data)
Manifests Shipped in V1:
src/specify_cli/missions/software-dev/expected-artifacts.yamlsrc/specify_cli/missions/research/expected-artifacts.yamlsrc/specify_cli/missions/documentation/expected-artifacts.yaml
Registry Access:
# src/specify_cli/dossier/manifest.py
class ManifestRegistry:
"""Load and query expected artifact manifests."""
@staticmethod
def load_manifest(mission_type: str) -> Optional[ExpectedArtifactManifest]:
"""Load manifest for mission, return None if not found."""
# Check missions/*/expected-artifacts.yaml
@staticmethod
def get_required_artifacts(manifest: ExpectedArtifactManifest, step_id: str) -> List[ExpectedArtifactSpec]:
"""Get required specs for mission step (from mission.yaml state machine).
Returns: required_always + required_by_step[step_id]
"""
base = manifest.required_always
step_specific = manifest.required_by_step.get(step_id, [])
return base + step_specific
MissionDossier (WP01)
Purpose: Collection of indexed artifacts for a single feature
class MissionDossier(BaseModel):
"""Complete artifact inventory for a mission/feature."""
# Identity
mission_slug: str = Field(..., description="e.g., 'software-dev'")
mission_run_id: str = Field(..., description="UUID or feature run identifier")
feature_slug: str = Field(..., description="e.g., '042-local-mission-dossier'")
feature_dir: str = Field(..., description="Absolute path to feature directory")
# Artifacts
artifacts: List[ArtifactRef] = Field(default_factory=list, description="All indexed artifacts")
# Completeness
manifest: Optional[ExpectedArtifactManifest] = Field(
None,
description="Loaded manifest for this mission type (None if not found)"
)
# Snapshot
latest_snapshot: Optional["MissionDossierSnapshot"] = Field(
None,
description="Most recent snapshot (after all artifacts indexed)"
)
# Timestamps
dossier_created_at: datetime = Field(default_factory=datetime.utcnow)
dossier_updated_at: datetime = Field(default_factory=datetime.utcnow)
def get_required_artifacts(self, step_id: Optional[str] = None) -> List[ArtifactRef]:
"""Return required artifacts for step (or all required if step_id=None)."""
if not self.manifest:
return []
# Match artifacts against manifest requirements
def get_missing_required_artifacts(self, step_id: Optional[str] = None) -> List[ArtifactRef]:
"""Return required artifacts that are not present."""
required = self.get_required_artifacts(step_id)
return [a for a in required if not a.is_present]
@property
def completeness_status(self) -> str:
"""'complete' iff all required artifacts present, else 'incomplete'."""
if not self.manifest:
return "unknown" # No manifest, can't judge completeness
missing = self.get_missing_required_artifacts()
return "complete" if not missing else "incomplete"
MissionDossierSnapshot (WP05)
Purpose: Point-in-time projection of dossier state + parity hash
class MissionDossierSnapshot(BaseModel):
"""Deterministic snapshot of dossier state."""
# Identity
feature_slug: str
snapshot_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
# Artifact Counts
total_artifacts: int = Field(..., description="Total indexed artifacts")
required_artifacts: int = Field(..., description="Count of required artifacts")
required_present: int = Field(..., description="Count of required artifacts present")
required_missing: int = Field(..., description="Count of required artifacts missing (blocking)")
optional_artifacts: int = Field(..., description="Count of optional artifacts")
optional_present: int = Field(..., description="Count of optional present")
# Completeness
completeness_status: str = Field(
...,
description="'complete' | 'incomplete' | 'unknown' (no manifest)"
)
# Parity Hash (Core Determinism)
parity_hash_sha256: str = Field(
...,
description="SHA256 of sorted artifact content hashes. Deterministic, reproducible."
)
parity_hash_components: List[str] = Field(
...,
description="Sorted list of individual artifact hashes (for audit)"
)
# Artifacts Summary
artifact_summaries: List[dict] = Field(
default_factory=list,
description="[{artifact_key, class, wp_id, step_id, is_present, error_reason}]"
)
# Timestamps
computed_at: datetime = Field(default_factory=datetime.utcnow)
class Config:
json_encoders = {
datetime: lambda v: v.isoformat()
}
Computation Algorithm:
def compute_snapshot(dossier: MissionDossier) -> MissionDossierSnapshot:
"""Deterministically compute snapshot from dossier."""
# 1. Sort artifacts by artifact_key
sorted_artifacts = sorted(dossier.artifacts, key=lambda a: a.artifact_key)
# 2. Extract hashes (present artifacts only)
present_hashes = [a.content_hash_sha256 for a in sorted_artifacts if a.is_present]
# 3. Sort hashes lexicographically (for order-independence)
sorted_hashes = sorted(present_hashes)
# 4. Concatenate and hash
combined = "".join(sorted_hashes)
parity_hash = hashlib.sha256(combined.encode()).hexdigest()
# 5. Count artifacts by status
required = [a for a in sorted_artifacts if a.required_status == "required"]
optional = [a for a in sorted_artifacts if a.required_status == "optional"]
completeness_status = (
"complete" if all(a.is_present for a in required)
else "incomplete" if any(r for r in required if not r.is_present)
else "unknown"
)
return MissionDossierSnapshot(
feature_slug=dossier.feature_slug,
total_artifacts=len(sorted_artifacts),
required_artifacts=len(required),
required_present=sum(1 for a in required if a.is_present),
required_missing=sum(1 for a in required if not a.is_present),
optional_artifacts=len(optional),
optional_present=sum(1 for a in optional if a.is_present),
completeness_status=completeness_status,
parity_hash_sha256=parity_hash,
parity_hash_components=sorted_hashes,
artifact_summaries=[...], # Summarize each artifact
)
Dossier Event Schemas (WP04)
MissionDossierArtifactIndexed
Purpose: Emitted when artifact successfully indexed
class MissionDossierArtifactIndexedPayload(BaseModel):
feature_slug: str
artifact_key: str
artifact_class: str
relative_path: str
content_hash_sha256: str
size_bytes: int
wp_id: Optional[str]
step_id: Optional[str]
required_status: str # "required" | "optional"
class MissionDossierArtifactIndexed(BaseModel):
"""Event emitted for each successfully indexed artifact."""
event_type: str = "mission_dossier_artifact_indexed"
payload: MissionDossierArtifactIndexedPayload
MissionDossierArtifactMissing
Purpose: Emitted when required artifact missing or unreadable
class MissionDossierArtifactMissingPayload(BaseModel):
feature_slug: str
artifact_key: str
artifact_class: str
expected_path_pattern: str
reason_code: str # "not_found" | "unreadable" | "invalid_format" | "deleted_after_scan"
reason_detail: Optional[str]
blocking: bool # True if this blocks completeness
class MissionDossierArtifactMissing(BaseModel):
"""Anomaly event: required artifact missing."""
event_type: str = "mission_dossier_artifact_missing"
payload: MissionDossierArtifactMissingPayload
MissionDossierSnapshotComputed
Purpose: Emitted when snapshot computed (after all artifacts indexed)
class MissionDossierSnapshotComputedPayload(BaseModel):
feature_slug: str
parity_hash_sha256: str
artifact_counts: dict # {total, required, required_present, required_missing, optional, optional_present}
completeness_status: str # "complete" | "incomplete" | "unknown"
snapshot_id: str
class MissionDossierSnapshotComputed(BaseModel):
"""Event: snapshot computed after dossier finalized."""
event_type: str = "mission_dossier_snapshot_computed"
payload: MissionDossierSnapshotComputedPayload
MissionDossierParityDriftDetected
Purpose: Emitted when local snapshot differs from cached baseline
class MissionDossierParityDriftDetectedPayload(BaseModel):
feature_slug: str
local_parity_hash: str
baseline_parity_hash: str
missing_in_local: List[str] # artifact_keys
missing_in_baseline: List[str] # artifact_keys (new artifacts)
severity: str # "info" | "warning" | "error"
class MissionDossierParityDriftDetected(BaseModel):
"""Anomaly event: parity hash differs from baseline."""
event_type: str = "mission_dossier_parity_drift_detected"
payload: MissionDossierParityDriftDetectedPayload
Dashboard API Response Models
DossierOverviewResponse
class DossierOverviewResponse(BaseModel):
feature_slug: str
completeness_status: str
parity_hash_sha256: str
artifact_counts: dict
missing_required_count: int
last_scanned_at: Optional[datetime]
ArtifactListResponse
class ArtifactListItem(BaseModel):
artifact_key: str
artifact_class: str
relative_path: str
size_bytes: int
wp_id: Optional[str]
step_id: Optional[str]
is_present: bool
error_reason: Optional[str]
class ArtifactListResponse(BaseModel):
total_count: int
filtered_count: int
artifacts: List[ArtifactItem] # Filtered & sorted
filters_applied: dict # {class, wp_id, step_id, required_only}
ArtifactDetailResponse
class ArtifactDetailResponse(BaseModel):
artifact_key: str
artifact_class: str
relative_path: str
content_hash_sha256: str
size_bytes: int
wp_id: Optional[str]
step_id: Optional[str]
required_status: str
is_present: bool
error_reason: Optional[str]
# Full content (or truncation notice)
content: Optional[str] # Full text if <5MB, else None
content_truncated: bool
truncation_notice: Optional[str] # "File >5MB, content not included"
media_type_hint: str # "markdown" | "json" | "yaml" | "text"
indexed_at: datetime
Storage Format
Dossier Snapshot Persistence
File: .kittify/dossiers/{feature_slug}/snapshot-latest.json
Contains serialized MissionDossierSnapshot (immutable point-in-time projection).
{
"feature_slug": "042-local-mission-dossier",
"snapshot_id": "abc123...",
"total_artifacts": 15,
"required_artifacts": 10,
"required_present": 10,
"required_missing": 0,
"completeness_status": "complete",
"parity_hash_sha256": "abc456def789...",
"computed_at": "2026-02-21T10:00:00Z"
}
Parity Baseline Cache (Robustly Namespaced)
File: .kittify/dossiers/{feature_slug}/parity-baseline.json
Stores point-in-time parity hash with identity tuple for robust drift detection (prevents false positives):
{
"baseline_key": {
"project_uuid": "550e8400-e29b-41d4-a716-446655440000",
"node_id": "abcdef123456",
"feature_slug": "042-local-mission-dossier",
"target_branch": "2.x",
"mission_key": "software-dev",
"manifest_version": "1"
},
"baseline_key_hash": "baseline_key_sha256_hash",
"parity_hash_sha256": "abc456def789...",
"captured_at": "2026-02-21T09:00:00Z",
"captured_by": "node_id_at_capture"
}
Baseline Key Components (prevent false positives):
project_uuid: Local project identity (from sync/project_identity.py)node_id: Stable machine identifier (from sync/project_identity.py)feature_slug: Feature identifiertarget_branch: Git branch where feature is based (main, 2.x, etc.)mission_key: Mission type (software-dev, research, documentation)manifest_version: Manifest schema version
Acceptance Logic:
- On parity detection, compute current key
- Compare current key vs baseline key (via hash)
- Accept baseline only if keys match; else treat as "no baseline" (informational drift event)
- This prevents false positives from branch switches, manifest updates, multi-user/multi-machine scenarios
Integration Points
With Sync Infrastructure
- Dossier events register in
src/specify_cli/sync/events.py - Emitted via
OfflineQueue.emit()(async-safe) - Routed to webhook (mock SaaS in tests)
With Dashboard API
- Endpoints in
src/specify_cli/dashboard/api.py - Import
MissionDossier,DossierSnapshot, filter/query logic - Vue components fetch via
/api/dossier/*
With Mission System
- Load
ExpectedArtifactManifestfor feature's mission type - Registry extensible for new missions (graceful degradation if no manifest)
References
- Pydantic Documentation: https://docs.pydantic.dev/
- datetime Handling: Use
datetime.utcnow()(UTC-aware, no timezone ambiguity) - SHA256 Hashing: Use
hashlib.sha256()from stdlib - YAML Parsing: Use
ruamel.yaml(existing spec-kitty dependency)