Spec Kitty

└─ kitty-specs
   └─ Refactor mission system.

Mission Run:

📚 Docs ↗

Research: Mission System Technical Decisions

Feature: 005-refactor-mission-system Date: 2025-01-16 Status: Complete

Research Questions

This research addresses three critical technical decisions for the mission system refactoring:

1. Schema Validation Library: Which library should validate mission.yaml structure? 2. Citation Format Validation: How should we validate citations in research mission? 3. Dashboard Integration: How should dashboard display and update active mission?

R1: Schema Validation Library Comparison

Research Question

Which schema validation library provides the best balance of error messages, zero dependencies, type safety, and Python 3.11+ compatibility for validating mission.yaml?

Options Evaluated

Option A: Pydantic v2

Pros:

Industry-standard for data validation in Python
Excellent error messages with field-level details
Full type hint integration
Auto-generates schema documentation
Coercion support (e.g., int ’ str)
Fast (Rust core in v2)
Active development and community

Cons:

External dependency (adds ~5MB to install)
Heavier than alternatives
May be overkill for simple YAML validation

Error Message Quality (A+):

# Missing required field
ValidationError: 1 validation error for MissionConfig
name
  Field required [type=missing, input_value={'domain': 'software'}, input_type=dict]

# Typo in field name
ValidationError: 1 validation error for MissionConfig
validaton
  Extra inputs are not permitted [type=extra_forbidden, input_value='git_clean', input_type=list]

Example Usage:

from pydantic import BaseModel, Field

class MissionConfig(BaseModel):
    name: str
    domain: Literal["software", "research", "writing"]
    version: str
    workflow: WorkflowConfig

mission = MissionConfig(**yaml.safe_load(config_file))

Verdict: Best-in-class validation with clear errors. Worth the dependency for production use.

Option B: attrs + cattrs

Pros:

Lighter weight than Pydantic (~400KB)
Type hint support
Good performance
Mature and stable
Composable validation

Cons:

Two packages required (attrs + cattrs)
Error messages less polished than Pydantic
Less community adoption
Manual schema documentation

Error Message Quality (B+):

# Missing required field
cattrs.errors.ClassValidationError: While structuring MissionConfig (1 sub-exception)
  + Exception Group Traceback (most recent call last):
  |   File "<stdin>", line 1, in <module>
  | cattrs.errors.ClassValidationError: While structuring MissionConfig (1 sub-exception)
  +-+---------------- 1 ----------------
    | TypeError: __init__() missing 1 required positional argument: 'name'

Example Usage:

import attrs
import cattrs

@attrs.define
class MissionConfig:
    name: str
    domain: str
    version: str

converter = cattrs.Converter()
mission = converter.structure(yaml_data, MissionConfig)

Verdict: Good middle ground. Lighter than Pydantic but requires two packages.

Option C: jsonschema

Pros:

Standard JSON Schema format
Language-agnostic schema
Good error messages
Well-documented
Widely used

Cons:

External dependency
No type hint integration
Schema separate from code
Verbose schema definitions
No auto-completion in IDEs

Error Message Quality (B):

# Missing required field
jsonschema.exceptions.ValidationError: 'name' is a required property
Failed validating 'required' in schema:
    {'properties': {'name': {'type': 'string'}, ...}, 'required': ['name', 'domain']}
On instance:
    {'domain': 'software'}

Example Usage:

import jsonschema

schema = {
    "type": "object",
    "required": ["name", "domain"],
    "properties": {
        "name": {"type": "string"},
        "domain": {"enum": ["software", "research"]}
    }
}

jsonschema.validate(yaml_data, schema)

Verdict: Decent but schema definitions are verbose and separate from code.

Option D: Dataclasses + Manual Validation

Pros:

Zero external dependencies (Python 3.11+ stdlib)
Full type hint integration
IDE auto-completion
Lightweight
Complete control over error messages

Cons:

Manual validation code required
Error messages only as good as we write them
More code to maintain
No automatic coercion

Error Message Quality (C+ with effort, F without):

# With custom validation
class MissionConfigError(Exception):
    pass

# Example validation
if "name" not in data:
    raise MissionConfigError(
        "Mission config missing required field: 'name'\n"
        f"Available fields: {list(data.keys())}\n"
        f"Required fields: name, domain, version, workflow, artifacts"
    )

Example Usage:

from dataclasses import dataclass
from typing import List, Dict

@dataclass
class MissionConfig:
    name: str
    domain: str
    version: str
    workflow: Dict

    @classmethod
    def from_dict(cls, data: Dict) -> 'MissionConfig':
        # Manual validation
        required = ["name", "domain", "version"]
        missing = [f for f in required if f not in data]
        if missing:
            raise MissionConfigError(f"Missing required fields: {missing}")
        return cls(**{k: data[k] for k in required})

Verdict: Zero dependencies but significant manual work required for quality errors.

Decision Matrix

Criterion	Pydantic	attrs+cattrs	jsonschema	dataclasses
Error Quality	A+	B+	B	C+ (manual)
Dependencies	1 (5MB)	2 (400KB)	1 (300KB)	0
Type Hints	Full	Full	L None	Full
IDE Support	Excellent	Good	L Limited	Excellent
Maintenance	Low	Low	=á Medium	L High
Performance	Fast	Fast	=á Medium	Fastest
Learning Curve	=á Medium	=á Medium	Low	Low
Community	Huge	=á Small	Large	Stdlib

Recommended Decision

Primary Recommendation: Pydantic v2

Rationale: 1. Error Quality: Critical for FR-007 (helpful error messages). Pydantic's field-level errors with suggestions are far superior 2. User Experience: Custom mission creators need excellent feedback. 5MB dependency is acceptable for this quality 3. Future-Proofing: If spec-kitty grows (API server, web dashboard), Pydantic is already integrated 4. Development Speed: Less manual validation code = faster implementation 5. Type Safety: Full type checking prevents bugs in mission.py refactoring

Alternative if Zero Dependencies Required: Dataclasses + Manual Validation

If adding Pydantic is blocked for dependency reasons:

Use dataclasses with comprehensive manual validation
Invest 2-3 extra days building quality error messages
Create validation helpers in src/specify_cli/validation_utils.py
Trade-off: More code to maintain, but zero external dependencies

Rejected Options:

attrs+cattrs: Two dependencies for marginal benefit over Pydantic
jsonschema: Poor IDE experience, no type hints

R2: Citation Format Validation

Research Question

How should we validate citations in evidence-log.csv and source-register.csv for the research mission?

Citation Formats to Support

BibTeX Format

@article{key2025,
  author = {Last, First},
  title = {Title of Paper},
  journal = {Journal Name},
  year = {2025}
}

Regex Pattern:

BIBTEX_PATTERN = r'@\w+\{[\w-]+,[\s\S]+?\}'

APA 7th Edition

Last, F. (2025). Title of paper. Journal Name, 10(2), 123-145. https://doi.org/...

Regex Pattern:

APA_PATTERN = r'^[\w\s,\.]+\(\d{4}\)\.[\s\S]+\.$'

Simple Citation (Fallback)

Author (Year). Title. Source. URL

Regex Pattern:

SIMPLE_PATTERN = r'^.+\(\d{4}\)\..+\..+\.'

Validation Approach

Progressive Validation Strategy:

1. Level 1 - Completeness (Always enforced):

Citation field is non-empty
Source type is one of: journal, conference, book, web, preprint
Year is 4-digit number

2. Level 2 - Format (Warning only):

Citation matches one of the supported patterns (BibTeX, APA, Simple)
If no match, warn: "Citation format not recognized. Consider using BibTeX or APA format."

3. Level 3 - Quality (Optional, for strict mode):

Check for DOI/URL presence
Validate journal/conference names against known lists
Check year is reasonable (1900-2030)

Implementation:

# src/specify_cli/validators/research.py

import csv
import re
from pathlib import Path
from typing import List, Tuple

class CitationError(Exception):
    pass

def validate_citations(evidence_log_path: Path) -> List[str]:
    """Validate citations in evidence-log.csv.

    Returns:
        List of validation issues (empty if all valid)
    """
    issues = []

    with open(evidence_log_path) as f:
        reader = csv.DictReader(f)
        for i, row in enumerate(reader, start=2):  # Start at 2 (header is line 1)
            citation = row.get('citation', '').strip()
            source_type = row.get('source_type', '').strip()

            # Level 1 - Completeness
            if not citation:
                issues.append(f"Line {i}: Citation is empty")
                continue

            valid_types = ['journal', 'conference', 'book', 'web', 'preprint']
            if source_type not in valid_types:
                issues.append(
                    f"Line {i}: Invalid source_type '{source_type}'. "
                    f"Must be one of: {', '.join(valid_types)}"
                )

            # Level 2 - Format (warning only)
            patterns = [
                (r'@\w+\{[\w-]+,', "BibTeX"),
                (r'^[\w\s,\.]+\(\d{4}\)\.', "APA"),
                (r'^.+\(\d{4}\)\..+\..+', "Simple")
            ]

            matched = False
            for pattern, fmt in patterns:
                if re.match(pattern, citation):
                    matched = True
                    break

            if not matched:
                issues.append(
                    f"Line {i}: Citation format not recognized. "
                    f"Consider using BibTeX or APA format."
                )

    return issues

Decision: Progressive validation with helpful error messages. Don't block on format (warning only), but enforce completeness.

Rationale:

Researchers use varied citation styles
Format enforcement would be too rigid
Completeness is what matters for research integrity
Warnings educate users without blocking workflow

R3: Dashboard Integration for Active Mission Display

Research Question

How should the dashboard display and update the active mission?

Options Evaluated

Option A: Server-Side Rendering (Simplest)

Approach: Include active mission in initial page load context.

Implementation:

# src/specify_cli/dashboard/server.py

from specify_cli.mission import get_active_mission

@app.get("/")
def index():
    mission = get_active_mission(project_root)
    return templates.TemplateResponse("index.html", {
        "project": project_root.name,
        "active_mission": {
            "name": mission.name,
            "domain": mission.domain
        },
        # ... other context
    })

UI Update After Switch: User must refresh page manually

Pros:

Zero complexity
No JavaScript required
Immediate implementation

Cons:

Manual refresh required
Doesn't feel "real-time"

Option B: Polling (Simple Real-Time)

Approach: Frontend polls /api/mission/current every 5-10 seconds.

Implementation:

# Backend
@app.get("/api/mission/current")
def get_current_mission():
    mission = get_active_mission(project_root)
    return {
        "name": mission.name,
        "domain": mission.domain,
        "updated_at": datetime.now().isoformat()
    }

// Frontend
setInterval(async () => {
    const response = await fetch('/api/mission/current');
    const data = await response.json();
    updateMissionDisplay(data);
}, 5000);  // Poll every 5 seconds

Pros:

Simple to implement
No WebSocket complexity
Updates automatically

Cons:

Polling overhead (5-10 req/minute)
5-10 second delay before update shows

Option C: WebSocket (True Real-Time)

Approach: Server pushes mission change events via WebSocket.

Implementation:

Requires FastAPI WebSocket support or socket.io
File watcher on .kittify/active-mission
Push event to connected clients immediately

Pros:

Instant updates (<1 second)
No polling overhead
Professional UX

Cons:

Significant complexity
Requires WebSocket library
File watching mechanism needed
Connection management

Option D: Hybrid (Pragmatic)

Approach: Server-side rendering + manual refresh with prominent indicator.

Implementation:

Mission shown on initial load (Option A)
Add "Refresh" button near mission display
Optionally: Detect mission change via localStorage timestamp on focus

Pros:

Simple like Option A
User-controlled refresh
No dependencies
Clear UX (button makes refresh obvious)

Cons:

Not automatic
Requires user action

Decision Matrix

Criterion	Server-Side	Polling	WebSocket	Hybrid
Complexity	Minimal	=á Low	L High	Minimal
Dependencies	Zero	Zero	L New libs	Zero
Update Speed	L Manual	=á 5-10sec	<1sec	L Manual
User Experience	=á Acceptable	Good	Excellent	Good
Maintenance	Low	Low	L Medium	Low

Recommended Decision

Primary Recommendation: Option D (Hybrid)

Rationale: 1. User Constraint: "Resist the urge to complicate the dashboard unless necessary" 2. Usage Pattern: Mission switching is infrequent (per user feedback) 3. Zero Dependencies: Aligns with preference for no new runtime dependencies 4. Clear UX: Refresh button makes the action explicit 5. Implementation Time: 1-2 hours vs 1-2 days for WebSocket

Implementation Details:

Add mission info to initial server context
Display in header: <div>Mission: {mission.name} <button>Refresh</button></div>
Optionally: Check timestamp on page focus and suggest refresh if stale

Future Enhancement Path: If mission switching becomes frequent, upgrade to Option B (Polling) with minimal changes.

Alternative if Real-Time Required: Option B (Polling)

If automatic updates are critical:

Use polling with 10-second interval
Low overhead (6 requests/minute)
Simple to implement
Upgrade from Option D requires minimal changes

Rejected Options:

Option A: No visual feedback when mission changes
Option C: Over-engineered for infrequent operation

Summary of Decisions

Schema Validation Library

Decision: Pydantic v2

Rationale: Superior error messages (A+ quality) justify the 5MB dependency. Critical for SC-003 (immediate error feedback) and SC-005 (clear feedback within 5 seconds). Industry-standard choice with excellent type safety.

Alternatives Considered:

attrs+cattrs (lighter but worse errors)
jsonschema (no type hints)
dataclasses (zero dependencies but high maintenance)

Impact:

Add pydantic>=2.0 to pyproject.toml or requirements.txt
Mission loading gains automatic validation
Custom mission creators get professional-grade feedback

Citation Format Validation

Decision: Progressive validation with completeness enforcement, format warnings

Rationale: Research citation styles vary widely. Enforcing specific format would be too rigid. Focus on completeness (non-empty, valid source type) with helpful format suggestions.

Implementation:

Python stdlib only (csv + re)
Three-level validation: completeness (error), format (warning), quality (optional strict mode)
Support BibTeX, APA, Simple citation patterns

Alternatives Considered:

Strict format enforcement (rejected - too rigid)
No validation (rejected - defeats purpose of research mission)
External citation library integration (rejected - over-engineered)

Impact:

Create src/specify_cli/validators/research.py
Add validation to research mission review workflow
Users get feedback on citation quality without workflow blocking

Dashboard Mission Display

Decision: Hybrid (server-side rendering + refresh button)

Rationale: Aligns with user guidance to "resist complication." Mission switching is infrequent. Manual refresh with clear button provides good UX without complexity.

Implementation:

Add mission to server context on page load
Display in header with refresh button
Optional: Check for staleness on page focus

Alternatives Considered:

Polling (adds unnecessary overhead for infrequent operation)
WebSocket (over-engineered for mission switching frequency)
Server-side only (no feedback mechanism)

Impact:

Minimal dashboard code changes
Zero new dependencies
Clear user experience
Easy upgrade path to polling if needed

Implementation Risks

Risk 1: Pydantic Dependency Size

Concern: 5MB dependency may be unwanted Mitigation: Document in assumptions. If rejected, fall back to dataclasses with 2-3 days additional work Likelihood: Low (Pydantic is widely accepted)

Risk 2: Citation Validation Too Loose

Concern: Warning-only format validation may allow poor citations Mitigation: Provide clear examples in docs. Add strict mode flag for future if needed Likelihood: Medium (depends on research mission adoption)

Risk 3: Dashboard Refresh UX

Concern: Manual refresh may frustrate users Mitigation: Make button prominent. Track user feedback. Upgrade to polling if complaints arise Likelihood: Low (mission switching is infrequent per user)

Next Steps

Phase 0 research complete. Proceed to Phase 1 (Design & Contracts):

1. data-model.md: Define Pydantic models for mission schema 2. quickstart.md: Developer guide for new features 3. Update agent context: Run agent script with new technologies