User Journey: Formalization at Session End (Experiment → Repeatable Approach)

Status: DRAFT
Date: 2026-02-17
Primary Contexts: Ad-Hoc Sessions, Specialist Handoffs, Approach Discovery, Formalization
Supporting Contexts: Doctrine Packs, Artifact Capture, Architecture Documentation, Test Suite Evolution
Related Spec: (none yet — design exploration artefact)

Scenario

A user runs an ad-hoc specialist session to experiment with a novel approach: infer functional requirements and non-functional requirements by having a specialist agent read only the test suite and reconstruct expected behavior. An architect specialist then checks the inferred requirements against the actual system design and writes an analysis. At the end of the session, the user decides the approach is valuable and explicitly requests formalization (“write down what we did” / “formalize so it’s repeatable”). The system produces a repeatable description of the approach and optionally proposes follow-up handoffs: updating architecture documentation and/or passing the analysis to development or QA specialists to adjust the test suite.

System Boundaries

ID	Boundary Name	Description
B1	User Workspace	Local repo: codebase, test suite, architecture docs, working tree changes
B2	Spec Kitty Runtime	Session orchestration, specialist selection, capture, and formalization tooling
B3	LLM Execution Context	Specialist reasoning: test inference, architecture validation, analysis synthesis
B4	Event/Artifact Sink	Where events, memory-dumps, analyses, and formalized approaches are stored
B5	Documentation Surface	Architecture documents / knowledge base location (could be repo docs, ADRs, etc.)

Actors

#	Actor	Type	Persona	Role in Journey
1	Developer/Contributor	`human`	(TBD)	Runs experiment, approves handoffs, triggers formalization, decides what to update
2	Spec Kitty CLI	`system`	(TBD)	Starts session, routes to specialists, captures artefacts, executes formalization
3	Test-Inference Specialist	`llm`	(TBD)	Reads only tests, reconstructs FRs/NFRs, states assumptions and confidence
4	Architect Specialist	`llm`	(TBD)	Validates inferred requirements against architecture/design, writes analysis and gaps
5	Dev Specialist	`llm`	(TBD)	Uses analysis to update implementation guidance or refactor targets (optional)
6	QA Specialist	`llm`	(TBD)	Uses analysis to update tests: coverage gaps, intent clarity, missing constraints (optional)
7	Event/Artifact Store	`system`	(TBD)	Persists memory-dumps, analysis report, formalized approach, and key events

Preconditions

Repo contains a test suite that meaningfully encodes behavior.
Architecture documentation exists (even if incomplete) OR there is a known design source (ADRs, diagrams, docs).
Specialist profiles exist for: test-inference, architect, dev, QA.
Session capture is enabled (memory-dump + artifact write-out).
The user agrees to “read-only tests” constraint for the inference phase (explicit boundary).

Journey Map

Phase	Driver	Active Actor(s)	System Boundary	Key Events
1. Start Ad-Hoc Session	Human	1, 2	B2	`AdHocSessionStarted`
2. Set Experiment Constraints	Human	1, 2	B2	`ExperimentConstraintSet`
3. Run Test-Only Inference	Human	1, 3	B1, B3	`TestSuiteScanned`, `RequirementsInferred`
4. Capture Inferred FRs/NFRs	Spec Kitty	2, 7	B2, B4	`InferenceArtifactWritten`
5. Architect Validates vs Design	Human	1, 4	B1, B3, B5	`ArchitectureReviewed`, `GapAnalysisProduced`
6. Capture Architect Analysis	Spec Kitty	2, 7	B2, B4	`AnalysisArtifactWritten`
7. Decide Follow-Ups (Docs vs Tests vs Both)	Human	1, 4	B2, B3	`FollowUpDecided`
8. Optional Handoff to Dev / QA	Human	1, 5 and/or 6, 2	B2, B3	`HandoffSuggested`, `HandoffApproved`
9. Apply Updates (Docs and/or Tests)	Human	1, 2, (5/6)	B1, B5	`DocsUpdated` and/or `TestsUpdated`
10. End Session + Request Formalization	Human	1, 2	B2	`FormalizationRequested`
11. Formalize Approach into Repeatable Recipe	Spec Kitty	2, 7, (3/4)	B2, B3, B4	`ApproachFormalized`, `FormalizationWritten`
12. Close Session	Human	1, 2	B2	`AdHocSessionClosed`

Coordination Rules

Default posture: Advisory

Experiment constraints (for example “tests-only inference”) must be explicitly declared before inference begins.
Specialist handoffs are suggestions only; the user must approve before switching or engaging additional specialists.
Inference and validation outputs must clearly separate:
- Observed evidence (from tests / docs)
- Assumptions
- Confidence level
Updates to docs or tests are human-approved actions; specialists can propose diffs but do not apply changes unilaterally.
Formalization happens only upon explicit user instruction at session end.
Formalization captures what was done as a repeatable approach without implying mission-grade tracing was present.

Responsibilities

Boundary B1 — User Workspace

Provide test suite access and relevant repo context.
Accept or reject proposed updates to tests and code.
Maintain working tree consistency (branching/staging decisions).

Boundary B2 — Spec Kitty Runtime

Maintain session lifecycle, capture points, and artefact output.
Enforce experiment constraints at the tool level where feasible (for example, scope file access to tests directory).
Provide explicit “formalize” command handling and artifact generation.
Record handoff approvals and follow-up decisions.

Boundary B3 — LLM Execution Context

Infer requirements from tests under declared constraints; state assumptions and uncertainty.
Architect validates inferred requirements against architecture/design artefacts; produce a gap analysis and recommendations.
Dev/QA specialists translate analysis into actionable doc/test update proposals.

Boundary B4 — Event/Artifact Sink

Persist:
- inference output (FRs/NFRs)
- architect analysis
- session memory-dump
- formalized approach artefact
Provide discoverability by session id/timestamp.

Boundary B5 — Documentation Surface

Accept updates to architecture documents and/or ADRs.
Provide a stable reference target for analysis links and follow-up work.

Observability Guarantees

Event Logging

Session lifecycle and formalization events are always emitted.
Key experiment milestones are emitted:
- constraints set
- inference produced
- analysis produced
- follow-up decision made
Handoff suggestions and approvals are captured as events.

State Visibility

The active specialist and active constraint set are visible during the session.
The user can see where inference and analysis artefacts were written.
The user can see whether formalization has occurred and where it was stored.

Presence & Coordination Signals

The system surfaces when constraints are active (e.g., “tests-only mode”).
The system surfaces when proposed updates touch:
- docs boundary (B5)
- tests boundary (B1) so the user understands impact.

Audit Guarantees

Outputs explicitly distinguish evidence vs assumption.
Formalized approach references the produced artefacts (inference + analysis) as examples.
This journey does not require mission-grade tracing, but guarantees:
- memory-dump exists
- final formalization artefact exists
- key milestone events exist

Scope: MVP (Formalization at Session End)

In Scope

Observe:
- Active constraint set (“tests-only”)
- Inference artifact location
- Architect analysis artifact location
- Follow-up decision (docs/tests/both)
- Formalization artifact location
Decide:
- Approve constraints
- Approve handoffs
- Choose follow-up targets (docs/tests/both)
- Trigger formalization at session end

Out of Scope (Deferred)

Automatic conversion into a Spec Kitty Mission recipe without review
Guaranteed reproducibility without human editing of the formalized approach
Hard enforcement of file-access constraints across all environments/tools
Rich structured tracing comparable to missions (step-by-step provenance)

Required Event Set

#	Event	Emitted By	Boundary	Phase
1	`AdHocSessionStarted`	Spec Kitty CLI	B2	1
2	`ExperimentConstraintSet`	Developer/Contributor	B2	2
3	`TestSuiteScanned`	Test-Inference Specialist	B3	3
4	`RequirementsInferred`	Test-Inference Specialist	B3	3
5	`InferenceArtifactWritten`	Spec Kitty CLI	B2/B4	4
6	`ArchitectureReviewed`	Architect Specialist	B3	5
7	`GapAnalysisProduced`	Architect Specialist	B3	5
8	`AnalysisArtifactWritten`	Spec Kitty CLI	B2/B4	6
9	`FollowUpDecided`	Developer/Contributor	B2	7
10	`HandoffSuggested`	Specialist Agent	B3	8
11	`HandoffApproved`	Developer/Contributor	B2	8
12	`DocsUpdated`	Developer/Contributor	B5	9
13	`TestsUpdated`	Developer/Contributor	B1	9
14	`FormalizationRequested`	Developer/Contributor	B2	10
15	`ApproachFormalized`	Spec Kitty CLI	B2/B3	11
16	`FormalizationWritten`	Spec Kitty CLI	B2/B4	11
17	`AdHocSessionClosed`	Spec Kitty CLI	B2	12

Acceptance Scenarios

Constraint-Limited Inference Produces Requirements Draft Given a repo with a test suite and an active “tests-only” constraint, when the user runs the test-inference specialist, then FRs and NFRs are inferred, assumptions are stated, and an inference artefact is written.
Architect Validation Produces Gap Analysis Given inferred FRs/NFRs from tests, when the architect specialist validates them against architecture/design artefacts, then a gap analysis is produced and captured as an analysis artefact.
Formalization at Session End Generates a Repeatable Approach Given inference and analysis artefacts exist for the session, when the user requests “formalize” at session end, then the system writes a repeatable approach artefact that references the session outputs as exemplars.
Follow-Up Targets Are Human-Decided Given the gap analysis suggests changes to docs and tests, when the system suggests handoffs to dev/QA specialists, then no updates occur without explicit human approval and the chosen targets are recorded.

Design Decisions

Decision	Rationale	ADR
Ad-hoc sessions support explicit experiment constraints	Enables deliberate technique testing and supports “read-only tests” inference posture	pending
Formalization is a session-end, user-triggered operation	Preserves human intent; avoids accidental workflow creation	pending
Output artefacts (inference + analysis) are captured by default	Enables later formalization and knowledge reuse without mission overhead	pending
Follow-up actions are handoff-based and human-approved	Maintains Human in charge and keeps ad-hoc mode non-invasive	pending

Product Alignment

Encourages disciplined experimentation without mission overhead.
Converts emergent practice into repeatable technique via explicit formalization.
Maintains doctrine consistency through specialist profiles and structured artefacts, while keeping human control central.