Agent Skill
2/7/2026

empirica-framework

This skill should be used when the user asks to 'assess my knowledge state', 'run preflight', 'do a postflight', 'use CASCADE workflow', 'track what I know', 'measure learning', 'check epistemic drift', 'spawn investigation agents', 'create handoff', or mentions epistemic vectors, calibration, noetic/praxic phases, functional self-awareness, or structured investigation before coding tasks.

N
nubaeon
113GitHub Stars
1Views
npx skills add Nubaeon/empirica

SKILL.md

Nameempirica-framework
DescriptionThis skill should be used when the user asks to 'assess my knowledge state', 'run preflight', 'do a postflight', 'use CASCADE workflow', 'track what I know', 'measure learning', 'check epistemic drift', 'spawn investigation agents', 'create handoff', or mentions epistemic vectors, calibration, noetic/praxic phases, functional self-awareness, or structured investigation before coding tasks.

name: empirica-framework description: "This skill should be used when the user asks to 'assess my knowledge state', 'run preflight', 'do a postflight', 'use CASCADE workflow', 'track what I know', 'measure learning', 'check epistemic drift', 'spawn investigation agents', 'create handoff', or mentions epistemic vectors, calibration, noetic/praxic phases, functional self-awareness, or structured investigation before coding tasks." version: 2.1.0

Empirica: Epistemic Framework Reference

Measure what you know. Track what you learn. Prevent overconfidence.

v2.1.0: Dual-track calibration (grounded verification), 4-phase CASCADE with POST-TEST. See CLAUDE.md for canonical terms (noetic/praxic/epistemic/context).


CASCADE Workflow

Every significant task follows: PREFLIGHT → CHECK → POSTFLIGHT → POST-TEST

PREFLIGHT ──► CHECK ──► POSTFLIGHT ──► POST-TEST
    │           │            │              │
 Baseline    Sentinel     Learning      Grounded
 Assessment    Gate        Delta       Verification

PREFLIGHT (Measure baseline)

Submit your honest epistemic state BEFORE starting work:

empirica preflight-submit - << 'EOF'
{
  "session_id": "<ID>",
  "task_context": "What you're about to do",
  "vectors": {
    "know": 0.6, "uncertainty": 0.4,
    "context": 0.7, "clarity": 0.8
  },
  "reasoning": "Honest assessment of current state"
}
EOF

CHECK (Sentinel gate)

Submit when ready to transition from noetic to praxic:

empirica check-submit - << 'EOF'
{
  "session_id": "<ID>",
  "vectors": {
    "know": 0.75, "uncertainty": 0.3,
    "context": 0.8, "clarity": 0.85
  },
  "reasoning": "Why ready (or not)"
}
EOF

Returns proceed or investigate based on readiness gate: know >= 0.70 AND uncertainty <= 0.35 (after bias correction).

When to CHECK:

  • Uncertainty > 0.5 (too uncertain)
  • Scope > 0.6 (high-impact changes)
  • Post-compact (context reduced)
  • Before irreversible actions

POSTFLIGHT (Measure delta + trigger grounded verification)

Submit AFTER completing work — the delta between PREFLIGHT and POSTFLIGHT is your learning measurement:

empirica postflight-submit - << 'EOF'
{
  "session_id": "<ID>",
  "vectors": {
    "know": 0.85, "uncertainty": 0.2,
    "context": 0.9, "clarity": 0.9
  },
  "reasoning": "Compare to PREFLIGHT - this is your learning delta"
}
EOF

POST-TEST (automatic): POSTFLIGHT automatically triggers grounded verification — objective evidence (tests, artifacts, git, goals) is collected and compared to your self-assessed vectors. The gap = real calibration error. See Dual-Track Calibration.


The 13 Epistemic Vectors

Rate each 0.0 to 1.0 with honest reasoning:

Foundation

VectorQuestion
engagementHow invested am I in this task?
knowWhat do I understand about the domain?
doCan I execute the required actions?
contextDo I have enough surrounding information?

Comprehension

VectorQuestion
clarityDo I understand what's being asked?
coherenceDoes my understanding fit together?
signalAm I detecting relevant patterns?
densityHow information-rich is my current state?

Execution

VectorQuestion
stateDo I understand the current system state?
changeHow much has changed since I last assessed?
completionHow complete is this phase? (phase-aware)
impactHow significant is this work?

Meta

VectorQuestion
uncertaintyHow unsure am I? (higher = more uncertain)

Key principle: Be ACCURATE, not optimistic. High uncertainty is valid data.


Dual-Track Calibration

Empirica uses two parallel calibration tracks:

Track 1: Self-Referential (PREFLIGHT → POSTFLIGHT)

Measures learning trajectory — how vectors change during work. Updated automatically on each POSTFLIGHT via Bayesian update.

Example bias corrections (exact values injected from .breadcrumbs.yaml):

  • Completion: ~+0.52 (underestimate progress)
  • Impact: ~+0.29 (underestimate significance)
  • Density/Signal/Change: ~+0.10 to +0.13

Track 2: Grounded Verification (POSTFLIGHT → Objective Evidence)

Measures calibration accuracy — does your self-assessment match reality? Triggered automatically after each POSTFLIGHT.

Evidence sources (collected automatically):

SourceQualityVectors Grounded
pytest resultsOBJECTIVEknow, do, clarity
Git metricsOBJECTIVEdo, change, state
Goal completionSEMI_OBJECTIVEcompletion, do, know
Artifact countsSEMI_OBJECTIVEknow, uncertainty, signal
Issue trackingSEMI_OBJECTIVEimpact, signal
Sentinel decisionsSEMI_OBJECTIVEcontext, uncertainty

Ungroundable vectors: engagement, coherence, density — no objective signal exists.

When tracks disagree: Track 2 (grounded) is more trustworthy. The grounded_calibration.divergence section in .breadcrumbs.yaml shows the gap per vector.

# Self-referential calibration (Track 1)
empirica calibration-report

# Grounded calibration (Track 2) — compare self-assessment vs evidence
empirica calibration-report --grounded

# Trajectory — is calibration improving over time?
empirica calibration-report --trajectory

Exact values injected from .breadcrumbs.yaml at session start.


Noetic Artifacts (Breadcrumbs)

Log as you work — these link to your active goal automatically:

# Findings — what was learned
empirica finding-log --session-id <ID> --finding "Auth uses JWT not sessions" --impact 0.7

# Unknowns — what remains unclear
empirica unknown-log --session-id <ID> --unknown "How does rate limiting work here?"

# Dead-ends — approaches that failed (prevents re-exploration)
empirica deadend-log --session-id <ID> --approach "Tried monkey-patching" --why-failed "Breaks in prod"

# Resolve unknowns when answered
empirica unknown-resolve --unknown-id <UUID> --resolved-by "Found in docs"

Impact scale: 0.1–0.3 trivial | 0.4–0.6 important | 0.7–0.9 critical | 1.0 transformative


Praxic Artifacts (Goals + Subtasks)

For complex work, create goals to track progress:

# Create goal
empirica goals-create --session-id <ID> --objective "Implement OAuth flow" \
  --scope-breadth 0.6 --scope-duration 0.5 --output json

# Add subtasks
empirica goals-add-subtask --goal-id <GOAL_ID> --description "Research OAuth providers"

# Complete subtasks with evidence
empirica goals-complete-subtask --subtask-id <TASK_ID> --evidence "commit abc123"

# Complete whole goal
empirica goals-complete --goal-id <GOAL_ID> --reason "Implementation verified"

# Check progress
empirica goals-progress --goal-id <GOAL_ID>

Note: Subtasks use --evidence, goals use --reason.


Memory Operations

Semantic Search (Qdrant)

# Focused search (eidetic facts + episodic arcs)
empirica project-search --project-id <ID> --task "authentication patterns"

# Full search (all 4 collections: docs, memory, eidetic, episodic)
empirica project-search --project-id <ID> --task "query" --type all

# Include cross-project learnings (ecosystem scope)
empirica project-search --project-id <ID> --task "query" --global

# Sync project memory to Qdrant
empirica project-embed --project-id <ID> --output json

Automatic ingestion (when Qdrant available):

  • finding-log → eidetic facts + immune decay on lessons
  • postflight-submit → episodic narratives + auto-embed + grounded verification (post-test evidence)
  • SessionStart hook → retrieves relevant memories post-compact

Pattern retrieval (auto-triggered):

  • PREFLIGHT: Returns lessons, dead-ends, relevant findings
  • CHECK: Validates against dead-ends, triggers mistake risk warnings

Optional setup: export EMPIRICA_QDRANT_URL="http://localhost:6333"

Empirica works fully without Qdrant — core CASCADE, goals, and calibration use SQLite.

Search Triggers

Use project search during noetic phases:

  1. Session start — prior learnings for current task
  2. Before logging unknown — check if already resolved
  3. Pre-CHECK — similar decision patterns
  4. Pre-self-improvement — conflicting guidance

Multi-Agent Operations

Spawn Investigation Agents

# Single agent
empirica agent-spawn --session-id <ID> \
  --task "Investigate authentication patterns" \
  --persona researcher --cascade-style exploratory

# Parallel agents with attention budget
empirica agent-parallel --session-id <ID> \
  --task "Analyze security and architecture" \
  --budget 20 --max-agents 5

Budget allocates by information gain: high-uncertainty domains get more resources. SubagentStop hook auto-gates rollup: scores by confidence x novelty x relevance.

Handoff Types

TypeWhenContains
InvestigationAfter CHECKNoetic artifacts, ready for praxic
CompleteAfter POSTFLIGHTFull learning cycle + calibration
PlanningAny timeDocumentation-only, no CASCADE required
empirica handoff-create --session-id <ID> \
  --task-summary "Investigated auth patterns" \
  --key-findings '["JWT with RS256", "Refresh in httpOnly cookies"]' \
  --next-session-context "Ready to implement token rotation"

Sentinel Safety Gates

Sentinel controls praxic actions (Edit, Write, NotebookEdit):

Readiness gate: know >= 0.70 AND uncertainty <= 0.35 (after bias correction)

Core features (always on):

  • PREFLIGHT requirement before acting
  • Decision parsing (blocks if CHECK returned "investigate")
  • Vector threshold validation
  • Anti-gaming: minimum noetic duration (30s) with evidence check

Configuration (user-only, DO NOT execute):

VariableValuesDefaultEffect
EMPIRICA_SENTINEL_LOOPINGtrue, falsetrueWhen false, disables Sentinel gating entirely
EMPIRICA_SENTINEL_MODEobserver, controllercontrollerobserver = log only, controller = actively block
EMPIRICA_SENTINEL_CHECK_EXPIRYtrue, falsefalse30-min CHECK expiry
EMPIRICA_SENTINEL_REQUIRE_BOOTSTRAPtrue, falsefalseRequire bootstrap before proceed

Common Patterns

Quick Task

PREFLIGHT → [praxic work] → POSTFLIGHT → POST-TEST

Investigation → Implementation

PREFLIGHT → [noetic: explore] → CHECK → [praxic: implement] → POSTFLIGHT → POST-TEST

Complex Feature

PREFLIGHT → Goal + Subtasks → [CHECK at each gate] → POSTFLIGHT → POST-TEST

Parallel Investigation

PREFLIGHT → agent-spawn (×N) → agent-aggregate → CHECK → POSTFLIGHT → POST-TEST

POST-TEST is automatic — triggered by POSTFLIGHT. No manual step needed.


Hook Integration

Hooks enforce CASCADE automatically:

HookEventAction
sentinel-gate.pyPreToolUseGates Edit/Write until valid CHECK
session-init.pySessionStart:newAuto-creates session + bootstrap
post-compact.pySessionStart:compactAuto-recovers session, prompts CHECK
session-end-postflight.pySessionEndAuto-captures POSTFLIGHT
tool-router.pyUserPromptSubmitVector-aware tool/agent routing

MCP Server Restart: After updating empirica-mcp code, restart the server:

pkill -f empirica-mcp  # Kill running server
# Then use /mcp in Claude Code to reconnect

Full Command Reference

empirica --help                          # All commands
empirica session-create --ai-id <name>   # Start session
empirica project-bootstrap --session-id <ID>  # Load context
empirica preflight-submit -              # PREFLIGHT (stdin JSON)
empirica check-submit -                  # CHECK gate (stdin JSON)
empirica postflight-submit -             # POSTFLIGHT (stdin JSON)
empirica finding-log --finding "..."     # Log noetic artifact (finding)
empirica unknown-log --unknown "..."     # Log noetic artifact (unknown)
empirica deadend-log --approach "..."    # Log noetic artifact (dead-end)
empirica goals-create --objective "..."  # Create praxic artifact (goal)
empirica goals-list                      # Show active goals
empirica calibration-report              # Self-referential calibration (Track 1)
empirica calibration-report --grounded   # Grounded calibration (Track 2)
empirica calibration-report --trajectory # Calibration trend over time
empirica agent-spawn --task "..."        # Spawn domain agent
empirica agent-parallel --task "..."     # Parallel investigation
empirica handoff-create ...              # Create handoff
empirica project-search --task "..."     # Semantic memory search

Best Practices

DO:

  • Apply bias corrections from .breadcrumbs.yaml (both self-ref and grounded)
  • Be honest about uncertainty (it's data, not failure)
  • Log noetic artifacts as you discover them (also anti-gaming evidence)
  • Use CHECK before major praxic actions
  • Compare POSTFLIGHT to PREFLIGHT (Track 1: learning delta)
  • Check calibration-report --grounded to see if self-assessment matches evidence (Track 2)
  • Use calibration-report --trajectory to see if calibration is improving

DON'T:

  • Inflate scores (Sentinel detects rushed assessments)
  • Skip PREFLIGHT (lose baseline AND get blocked)
  • Ignore high uncertainty signals
  • Proceed without CHECK when uncertainty > 0.5
  • Rush PREFLIGHT→CHECK without actual noetic work
  • Trust Track 1 over Track 2 when they diverge (grounded evidence wins)

Remember: When uncertain, say so. That's genuine metacognition.

Skills Info
Original Name:empirica-frameworkAuthor:nubaeon