name: empirica-framework description: "This skill should be used when the user asks to 'assess my knowledge state', 'run preflight', 'do a postflight', 'use CASCADE workflow', 'track what I know', 'measure learning', 'check epistemic drift', 'spawn investigation agents', 'create handoff', or mentions epistemic vectors, calibration, noetic/praxic phases, functional self-awareness, or structured investigation before coding tasks." version: 2.1.0

Empirica: Epistemic Framework Reference

Measure what you know. Track what you learn. Prevent overconfidence.

v2.1.0: Dual-track calibration (grounded verification), 4-phase CASCADE with POST-TEST. See CLAUDE.md for canonical terms (noetic/praxic/epistemic/context).

CASCADE Workflow

Every significant task follows: PREFLIGHT → CHECK → POSTFLIGHT → POST-TEST

PREFLIGHT ──► CHECK ──► POSTFLIGHT ──► POST-TEST
    │           │            │              │
 Baseline    Sentinel     Learning      Grounded
 Assessment    Gate        Delta       Verification

PREFLIGHT (Measure baseline)

Submit your honest epistemic state BEFORE starting work:

empirica preflight-submit - << 'EOF'
{
  "session_id": "<ID>",
  "task_context": "What you're about to do",
  "vectors": {
    "know": 0.6, "uncertainty": 0.4,
    "context": 0.7, "clarity": 0.8
  },
  "reasoning": "Honest assessment of current state"
}
EOF

CHECK (Sentinel gate)

Submit when ready to transition from noetic to praxic:

empirica check-submit - << 'EOF'
{
  "session_id": "<ID>",
  "vectors": {
    "know": 0.75, "uncertainty": 0.3,
    "context": 0.8, "clarity": 0.85
  },
  "reasoning": "Why ready (or not)"
}
EOF

Returns proceed or investigate based on readiness gate: know >= 0.70 AND uncertainty <= 0.35 (after bias correction).

When to CHECK:

Uncertainty > 0.5 (too uncertain)
Scope > 0.6 (high-impact changes)
Post-compact (context reduced)
Before irreversible actions

POSTFLIGHT (Measure delta + trigger grounded verification)

Submit AFTER completing work — the delta between PREFLIGHT and POSTFLIGHT is your learning measurement:

empirica postflight-submit - << 'EOF'
{
  "session_id": "<ID>",
  "vectors": {
    "know": 0.85, "uncertainty": 0.2,
    "context": 0.9, "clarity": 0.9
  },
  "reasoning": "Compare to PREFLIGHT - this is your learning delta"
}
EOF

POST-TEST (automatic): POSTFLIGHT automatically triggers grounded verification — objective evidence (tests, artifacts, git, goals) is collected and compared to your self-assessed vectors. The gap = real calibration error. See Dual-Track Calibration.

The 13 Epistemic Vectors

Rate each 0.0 to 1.0 with honest reasoning:

Foundation

Vector	Question
engagement	How invested am I in this task?
know	What do I understand about the domain?
do	Can I execute the required actions?
context	Do I have enough surrounding information?

Comprehension

Vector	Question
clarity	Do I understand what's being asked?
coherence	Does my understanding fit together?
signal	Am I detecting relevant patterns?
density	How information-rich is my current state?

Execution

Vector	Question
state	Do I understand the current system state?
change	How much has changed since I last assessed?
completion	How complete is this phase? (phase-aware)
impact	How significant is this work?

Vector	Question
uncertainty	How unsure am I? (higher = more uncertain)

Dual-Track Calibration

Empirica uses two parallel calibration tracks:

Track 1: Self-Referential (PREFLIGHT → POSTFLIGHT)

Measures learning trajectory — how vectors change during work. Updated automatically on each POSTFLIGHT via Bayesian update.

Example bias corrections (exact values injected from .breadcrumbs.yaml):

Completion: ~+0.52 (underestimate progress)
Impact: ~+0.29 (underestimate significance)
Density/Signal/Change: ~+0.10 to +0.13

Track 2: Grounded Verification (POSTFLIGHT → Objective Evidence)

Measures calibration accuracy — does your self-assessment match reality? Triggered automatically after each POSTFLIGHT.

Evidence sources (collected automatically):

Source	Quality	Vectors Grounded
pytest results	OBJECTIVE	know, do, clarity
Git metrics	OBJECTIVE	do, change, state
Goal completion	SEMI_OBJECTIVE	completion, do, know
Artifact counts	SEMI_OBJECTIVE	know, uncertainty, signal
Issue tracking	SEMI_OBJECTIVE	impact, signal
Sentinel decisions	SEMI_OBJECTIVE	context, uncertainty

Ungroundable vectors: engagement, coherence, density — no objective signal exists.

When tracks disagree: Track 2 (grounded) is more trustworthy. The grounded_calibration.divergence section in .breadcrumbs.yaml shows the gap per vector.

# Self-referential calibration (Track 1)
empirica calibration-report

# Grounded calibration (Track 2) — compare self-assessment vs evidence
empirica calibration-report --grounded

# Trajectory — is calibration improving over time?
empirica calibration-report --trajectory

Exact values injected from .breadcrumbs.yaml at session start.

Noetic Artifacts (Breadcrumbs)

Log as you work — these link to your active goal automatically:

# Findings — what was learned
empirica finding-log --session-id <ID> --finding "Auth uses JWT not sessions" --impact 0.7

# Unknowns — what remains unclear
empirica unknown-log --session-id <ID> --unknown "How does rate limiting work here?"

# Dead-ends — approaches that failed (prevents re-exploration)
empirica deadend-log --session-id <ID> --approach "Tried monkey-patching" --why-failed "Breaks in prod"

# Resolve unknowns when answered
empirica unknown-resolve --unknown-id <UUID> --resolved-by "Found in docs"

Impact scale: 0.1–0.3 trivial | 0.4–0.6 important | 0.7–0.9 critical | 1.0 transformative

Praxic Artifacts (Goals + Subtasks)

For complex work, create goals to track progress:

# Create goal
empirica goals-create --session-id <ID> --objective "Implement OAuth flow" \
  --scope-breadth 0.6 --scope-duration 0.5 --output json

# Add subtasks
empirica goals-add-subtask --goal-id <GOAL_ID> --description "Research OAuth providers"

# Complete subtasks with evidence
empirica goals-complete-subtask --subtask-id <TASK_ID> --evidence "commit abc123"

# Complete whole goal
empirica goals-complete --goal-id <GOAL_ID> --reason "Implementation verified"

# Check progress
empirica goals-progress --goal-id <GOAL_ID>

Note: Subtasks use --evidence, goals use --reason.

Memory Operations

Semantic Search (Qdrant)

# Focused search (eidetic facts + episodic arcs)
empirica project-search --project-id <ID> --task "authentication patterns"

# Full search (all 4 collections: docs, memory, eidetic, episodic)
empirica project-search --project-id <ID> --task "query" --type all

# Include cross-project learnings (ecosystem scope)
empirica project-search --project-id <ID> --task "query" --global

# Sync project memory to Qdrant
empirica project-embed --project-id <ID> --output json

Automatic ingestion (when Qdrant available):

finding-log → eidetic facts + immune decay on lessons
postflight-submit → episodic narratives + auto-embed + grounded verification (post-test evidence)
SessionStart hook → retrieves relevant memories post-compact

Pattern retrieval (auto-triggered):

PREFLIGHT: Returns lessons, dead-ends, relevant findings
CHECK: Validates against dead-ends, triggers mistake risk warnings

Optional setup: export EMPIRICA_QDRANT_URL="http://localhost:6333"

Empirica works fully without Qdrant — core CASCADE, goals, and calibration use SQLite.

Search Triggers

Use project search during noetic phases:

Session start — prior learnings for current task
Before logging unknown — check if already resolved
Pre-CHECK — similar decision patterns
Pre-self-improvement — conflicting guidance

Multi-Agent Operations

Spawn Investigation Agents

# Single agent
empirica agent-spawn --session-id <ID> \
  --task "Investigate authentication patterns" \
  --persona researcher --cascade-style exploratory

# Parallel agents with attention budget
empirica agent-parallel --session-id <ID> \
  --task "Analyze security and architecture" \
  --budget 20 --max-agents 5

Budget allocates by information gain: high-uncertainty domains get more resources. SubagentStop hook auto-gates rollup: scores by confidence x novelty x relevance.

Handoff Types

Type	When	Contains
Investigation	After CHECK	Noetic artifacts, ready for praxic
Complete	After POSTFLIGHT	Full learning cycle + calibration
Planning	Any time	Documentation-only, no CASCADE required

empirica handoff-create --session-id <ID> \
  --task-summary "Investigated auth patterns" \
  --key-findings '["JWT with RS256", "Refresh in httpOnly cookies"]' \
  --next-session-context "Ready to implement token rotation"

Sentinel Safety Gates

Sentinel controls praxic actions (Edit, Write, NotebookEdit):

Readiness gate: know >= 0.70 AND uncertainty <= 0.35 (after bias correction)

Core features (always on):

PREFLIGHT requirement before acting
Decision parsing (blocks if CHECK returned "investigate")
Vector threshold validation
Anti-gaming: minimum noetic duration (30s) with evidence check

Configuration (user-only, DO NOT execute):

Variable	Values	Default	Effect
`EMPIRICA_SENTINEL_LOOPING`	`true`, `false`	`true`	When `false`, disables Sentinel gating entirely
`EMPIRICA_SENTINEL_MODE`	`observer`, `controller`	`controller`	`observer` = log only, `controller` = actively block
`EMPIRICA_SENTINEL_CHECK_EXPIRY`	`true`, `false`	`false`	30-min CHECK expiry
`EMPIRICA_SENTINEL_REQUIRE_BOOTSTRAP`	`true`, `false`	`false`	Require bootstrap before proceed

Common Patterns

Quick Task

PREFLIGHT → [praxic work] → POSTFLIGHT → POST-TEST

Investigation → Implementation

PREFLIGHT → [noetic: explore] → CHECK → [praxic: implement] → POSTFLIGHT → POST-TEST

Complex Feature

PREFLIGHT → Goal + Subtasks → [CHECK at each gate] → POSTFLIGHT → POST-TEST

Parallel Investigation

PREFLIGHT → agent-spawn (×N) → agent-aggregate → CHECK → POSTFLIGHT → POST-TEST

POST-TEST is automatic — triggered by POSTFLIGHT. No manual step needed.

Hook Integration

Hooks enforce CASCADE automatically:

Hook	Event	Action
`sentinel-gate.py`	PreToolUse	Gates Edit/Write until valid CHECK
`session-init.py`	SessionStart:new	Auto-creates session + bootstrap
`post-compact.py`	SessionStart:compact	Auto-recovers session, prompts CHECK
`session-end-postflight.py`	SessionEnd	Auto-captures POSTFLIGHT
`tool-router.py`	UserPromptSubmit	Vector-aware tool/agent routing

MCP Server Restart: After updating empirica-mcp code, restart the server:

pkill -f empirica-mcp  # Kill running server
# Then use /mcp in Claude Code to reconnect

Full Command Reference

empirica --help                          # All commands
empirica session-create --ai-id <name>   # Start session
empirica project-bootstrap --session-id <ID>  # Load context
empirica preflight-submit -              # PREFLIGHT (stdin JSON)
empirica check-submit -                  # CHECK gate (stdin JSON)
empirica postflight-submit -             # POSTFLIGHT (stdin JSON)
empirica finding-log --finding "..."     # Log noetic artifact (finding)
empirica unknown-log --unknown "..."     # Log noetic artifact (unknown)
empirica deadend-log --approach "..."    # Log noetic artifact (dead-end)
empirica goals-create --objective "..."  # Create praxic artifact (goal)
empirica goals-list                      # Show active goals
empirica calibration-report              # Self-referential calibration (Track 1)
empirica calibration-report --grounded   # Grounded calibration (Track 2)
empirica calibration-report --trajectory # Calibration trend over time
empirica agent-spawn --task "..."        # Spawn domain agent
empirica agent-parallel --task "..."     # Parallel investigation
empirica handoff-create ...              # Create handoff
empirica project-search --task "..."     # Semantic memory search

Best Practices

DO:

Apply bias corrections from .breadcrumbs.yaml (both self-ref and grounded)
Be honest about uncertainty (it's data, not failure)
Log noetic artifacts as you discover them (also anti-gaming evidence)
Use CHECK before major praxic actions
Compare POSTFLIGHT to PREFLIGHT (Track 1: learning delta)
Check calibration-report --grounded to see if self-assessment matches evidence (Track 2)
Use calibration-report --trajectory to see if calibration is improving

DON'T:

Inflate scores (Sentinel detects rushed assessments)
Skip PREFLIGHT (lose baseline AND get blocked)
Ignore high uncertainty signals
Proceed without CHECK when uncertainty > 0.5
Rush PREFLIGHT→CHECK without actual noetic work
Trust Track 1 over Track 2 when they diverge (grounded evidence wins)

Remember: When uncertain, say so. That's genuine metacognition.

empirica-framework

SKILL.md