Skilled Agent Orchestration w/ Custom Spec Kit

Core layer	What it adds
📋 Spec Kit Framework	Structured plans, task tracking, validation gates, and handover docs
🧠 Cognitive Memory	Local-first project memory for decisions, context, and continuity
⚛️ Hybrid RAG + Smart Graph	Retrieval that blends semantic search with graph-aware project context
🔍 Code Index + Graph	Callers, imports, impact paths, and concept-based code discovery
🤖 11 Specialized Agents	Focused roles for implementation, review, research, docs, git, and more
🎯 22 On-Demand Skills	Skill Advisor routing for the right workflow at the right time

Reasons to try it

Works with Opencode, Codex, Claude Code, Gemini, and Devin CLI
Supports external CLI agent orchestration without unnecessary MCPs or proxies
Designed to be modular, readable, and easy to adapt to your own stack

Don't buy me unwanted coffee: https://buymeacoffee.com/michelkerkmeester

OVERVIEW
QUICK START
FEATURES
CONFIGURATION
FAQ
RELATED DOCUMENTS

1. OVERVIEW

What This Framework Does

AI coding assistants have amnesia. Every session starts from zero. You explain your architecture Monday. By Wednesday, it is gone. Decisions, trade-offs, the carefully reasoned choices behind them, all lost the moment the conversation window closes. This framework fixes that.

The framework adds four layers on top of the base platform:

Structured documentation (Spec Kit) - every file change gets a spec folder recording what changed, why and how. Like a lab notebook for software.
Cognitive memory (MCP server) - a local-first memory engine storing decisions, context and project history in a searchable database. Like a personal librarian who remembers every conversation.
Code intelligence (Code Graph + CocoIndex) - structural graph indexing handles callers, imports and impact analysis, while semantic search finds code by concept.
Coordinated agents and skills - 11 specialized agents routed by a gate system that loads the right skills at the right time.


🤖 11 Agents	11 custom specialists, multi-runtime
🎯 22 Skills	Code, docs, git, prompts, MCP, research, review, council, improvement, cross-AI, small-model sentinel, and standalone system packages
⌨️ 24 Commands	4 speckit + 4 memory + 7 create + 4 deep + 3 doctor + 2 root utilities
🔧 69 MCP Tools	mk-spec-memory (39), mk_skill_advisor (9), mk_code_index (11), code mode (7), CocoIndex (2), sequential thinking (1). See canonical count in FAQ
🔍 CocoIndex Code	Forked from cocoindex-io/cocoindex-code (Apache 2.0) - semantic code search via vector embeddings and natural-language discovery across 28+ languages
🏗️ Code Graph	First-class skill at `.opencode/skills/system-code-graph/` with standalone MCP server identity `mk_code_index` and client namespace `mcp__mk_code_index__*`
⚡ Runtime Coverage	OpenCode, Codex CLI (requires `[features].codex_hooks = true` opt-in for native hooks), Claude Code, Gemini CLI, plus Copilot CLI startup-surface support (file-based custom instructions)

How It All Connects

                         YOUR REQUEST
                              │
                              ▼
         ┌──────────────────────────────────────────┐
         │       GATE SYSTEM (3 mandatory gates)    │
         │                                          │
         │  Gate 1: Context     Gate 2: Skills      │
         │  Surface relevant    Auto-load the right │
         │  prior memory        domain expertise    │
         │                                          │
         │  Gate 3: Spec Folder (HARD BLOCK)        │
         │  Every file change needs documentation   │
         └──────────────────────┬───────────────────┘
                                │
                 ┌──────────────┴──────────────┐
                 ▼                             ▼
         ┌───────────────┐          ┌──────────────────┐
         │ AGENT NETWORK │          │  SKILLS LIBRARY  │
         │ 11 specialized│          │ 22 domain skills │
         │ agents with   │◄────────►│ auto-loaded by   │
         │ routing logic │          │ task keywords    │
         └───────┬───────┘          └────────┬─────────┘
                 │                           │
                 ▼                           ▼
         ┌──────────────────────────────────────────┐
         │          NATIVE MCP TOPOLOGY             │
         │  6 native servers - each one a separate  │
         │  process and MCP boundary                │
         │                                          │
         │  mk-spec-memory      context + memory   │
         │  mk_skill_advisor     skill routing      │
         │  mk_code_index        structural graph   │
         │  cocoindex_code       semantic search    │
         │  code_mode            external tools     │
         │  sequential_thinking  reasoning helper   │
         │                                          │
         │  Shared contract: hybrid retrieval +     │
         │  startup payload via runtime hooks       │
         └──────────────────────┬───────────────────┘
                                │
                                ▼
         ┌──────────────────────────────────────────┐
         │     SPEC KIT (documentation framework)   │
         │  specs/###-feature/ - scratch/           │
         │  4 levels - template set - 20 rules      │
         │  nomic-v1.5 (Ollama) │ HF Local │ Voyage │
         └──────────────────────────────────────────┘

What's Shipped Recently

The code graph now lives in .opencode/skills/system-code-graph/ with its own MCP boundary. A follow-on rename established mk_code_index as the standalone server identity and mcp__mk_code_index__* as the live documentation namespace.

Recent work also tightened the public surface without turning this README into a changelog: CocoIndex now has a canonical feature inventory, stress coverage is aligned, and the local llama-cpp path has stronger embedding failure reporting plus token-aware truncation.

Embedder Architecture

Both native MCPs are pluggable out of the box, no code change to swap. mk-spec-memory defaults to sbert/nomic-ai/CodeRankEmbed (768 dim, MIT) through the Ollama -> hf-local Nomic cascade; SPECKIT_CROSS_ENCODER remains default-off, with configured opt-in reranker cross-encoder/ms-marco-MiniLM-L-6-v2. CocoIndex ships a two-stage pipeline: Stage 1 embedder default sbert/nomic-ai/CodeRankEmbed (768 dim, MIT, code-tuned bi-encoder, MPS auto-detect on Apple Silicon — promoted 2026-05-19); Stage 2 cross-encoder reranker default Qwen/Qwen3-Reranker-0.6B (Apache-2.0 — promoted 2026-05-20 per ADR-027 after a head-to-head bench against jina-reranker-v3, which is kept as opt-in fallback). The Code Graph rides on CocoIndex's embedder choice via a shared bridge. See the canonical narrative at embedder-pluggability.md and the swap runbook in CocoIndex INSTALL_GUIDE §4 "Choosing an embedder".

2. QUICK START

Installation

Prerequisites: Node.js 18+ with npm, git and a POSIX shell. The launcher binaries vendor their own dependencies on first run, so you do not need TypeScript or tsc installed globally.

# 1. Clone the repository
git clone https://github.com/MichelKerkmeester/opencode--spec-kit-skilled-agent-orchestration.git
cd opencode--spec-kit-skilled-agent-orchestration

# 2. Install root dependencies (file watcher + shared HTTP utilities)
npm install

# 3. Boot the native MCP servers via their committed launchers
# Each launcher is a self-contained .cjs that vendors its own deps on first run.
node .opencode/bin/mk-spec-memory-launcher.cjs --help
node .opencode/bin/mk-skill-advisor-launcher.cjs --help
node .opencode/bin/mk-code-index-launcher.cjs --help

# 4. (Optional) Install the CocoIndex Code soft-fork for semantic code search
bash .opencode/skills/mcp-coco-index/scripts/install.sh

The native MCP servers (mk-spec-memory, mk_skill_advisor, mk_code_index) ship as committed launcher binaries under .opencode/bin/. They self-vendor their dependencies on first invocation and the checked-in runtime configs already point at them. There is no separate build step.

Runtime lifecycle guardrails are part of the native MCP stack. The servers share SPECKIT_LAUNCHER_IDLE_TIMEOUT_MIN for idle self-exit, and the repo ships a dry-run-first orphan process sweeper plus a LaunchAgent template under .opencode/scripts/. The LaunchAgent is not installed or loaded by default; activation is a separate operator-approved rollout. See Repo Scripts Runbook and the 022 orphan MCP leak prevention packet.

Set Up Embedding Provider

Choose an embedding provider:

# Default when no cloud keys are set: nomic-embed-text-v1.5 (768 dim)
# served by a local Ollama HTTP endpoint. Pull the model once:
#   ollama pull nomic-embed-text:v1.5
# (jina-embeddings-v3 is the second-priority fallback; pull via:
#   ollama pull hf.co/gaianet/jina-embeddings-v3-GGUF:Q4_K_M)

# Option A: Voyage AI (cloud, requires API key, opt-in only)
export VOYAGE_API_KEY="your-key-here"

# Option B: OpenAI embeddings (cloud, requires API key)
export OPENAI_API_KEY="your-key-here"

# Option C: HuggingFace Local (free, CPU/ONNX fallback when Ollama is unavailable)
# Auto-detected when the Ollama probe fails and no cloud keys are set

Verify Installation

# Confirm the launcher binaries respond
node .opencode/bin/mk-spec-memory-launcher.cjs --help
node .opencode/bin/mk-skill-advisor-launcher.cjs --help
node .opencode/bin/mk-code-index-launcher.cjs --help

# Confirm the active runtime's MCP config references the launchers
# (only the runtime you use needs to exist. .codex/config.toml ships in the repo)
grep -l 'mk-spec-memory\|mk_skill_advisor\|mk_code_index' \
  opencode.json .claude/mcp.json .codex/config.toml .gemini/settings.json 2>/dev/null

First Use

Open OpenCode in your project directory. The framework is active. Try:

/speckit:complete Build a user authentication system

This creates a spec folder, runs research, builds a plan and begins implementation - all with memory saved automatically. When you come back tomorrow, the memory engine remembers everything.

Adapting to Your Stack

This repo ships as a public template. Of the shipped skills, sk-code carries the stack-specific patterns (frontend framework, animation library, CMS, backend language). Start there when forking. The other shipped skills (system-spec-kit, sk-doc, sk-git, sk-code-review, mcp-coco-index, the deep-research/deep-review loops, deep-loop-runtime, the cli-* orchestrators) are codebase-agnostic out of the box and work for any project without modification. Most teams will also add their own skills on top. Drop them into .opencode/skills/<your-skill>/ and they'll be picked up automatically.

See §4 Customizing for Your Stack for the full customization map and step-by-step adaptation guide.

Code-Graph Indexing

The standalone mk_code_index MCP server indexes your project's production code by default, not the framework backend. End users inherit this behavior automatically through the committed config defaults. See §4 Maintainer-Mode Code-Graph Flags only if you're contributing upstream.

3. FEATURES

📋 Spec Kit Documentation

The Spec Kit enforces structured spec folders for every file-modifying conversation. Gate 3 requires a spec folder answer before any file modification begins (only typo/whitespace fixes under 5 characters are exempt).

Documentation Levels

Documentation depth scales with task complexity.

Level	LOC Guidance	Required Files	When to Use
1	< 100	spec.md, plan.md, tasks.md, implementation-summary.md	Small features, bug fixes, single-file changes
2	100 - 499	Level 1 + checklist.md	Features needing QA verification, multi-file changes
3	500+	Level 2 + decision-record.md	Architecture changes, complex refactors
3+	Complexity 80+	Level 3 + approval workflow, compliance checkpoints, stakeholder matrix	High-complexity work needing review tracking and workstream coordination

The LOC ranges are guidance, not hard rules. Risk, complexity and the number of affected files can push a task to a higher level. When in doubt, choose the higher level.

Implementation-summary.md is required at all levels but created after implementation completes, not at spec folder creation time.

Spec Folder Structure

specs/<###-feature-name>/
├── description.json             # Spec identity and memory tracking metadata
├── spec.md                      # What the feature is and why it exists
├── plan.md                      # How to implement it
├── tasks.md                     # Step-by-step task breakdown
├── checklist.md                 # QA validation gates (Level 2+)
├── decision-record.md           # Architecture decisions (Level 3+)
├── implementation-summary.md    # Post-implementation summary (all levels)
├── resource-map.md              # Optional path ledger of resources the packet touched
├── graph-metadata.json          # Packet-level graph metadata (auto-refreshed on save)
└── scratch/                     # Temporary workspace files

resource-map.md is optional at any level. Render it from .opencode/skills/system-spec-kit/templates/manifest/resource-map.md.tmpl when a packet wants a lean, central listing of the files, scripts and external resources it interacts with. Deep-research and deep-review loops emit it automatically next to review-report.md.

Checklist Priority System

Checklists use a priority system so reviewers know what blocks shipping and what can wait:

P0 - Hard blocker. Cannot ship without this. Cannot defer.
P1 - Required. Must complete or get explicit user approval to defer.
P2 - Optional. Nice to have. Can defer without approval.

Phase Decomposition

Phase decomposition splits large features into a parent spec folder (overall specification) and child folders (one per phase).

specs/022-big-feature/             # Parent spec folder
├── spec.md                        # Overall specification
├── 001-data-model/                # Phase 1 child
│   ├── spec.md
│   └── ...
├── 002-api-endpoints/             # Phase 2 child
│   ├── spec.md
│   └── ...
└── 003-frontend/                  # Phase 3 child
    ├── spec.md
    └── ...

Use create.sh --phase to create a parent with its first child in one step. Run validate.sh --recursive to validate the parent and all children together.

Validation

The validate.sh script runs 20 rules against a spec folder and reports what passes and what needs fixing. Rules check for required files, template compliance, placeholder detection, anchor markers and cross-reference consistency.

Exit 0 - All rules pass. Ready to proceed.
Exit 1 - User error (bad flags or invalid input).
Exit 2 - Validation error. Must fix before claiming completion.
Exit 3 - System error (file I/O failure, missing manifest or other environment problem).

Run with --verbose to see details behind each rule or --recursive to validate a parent and all child phase folders. Strict validation of a Level 3 packet runs in ~108 ms via a single-orchestrator design. The default scaffold path skips post-create validation. Set SPECKIT_POST_VALIDATE=1 to enable it for strict CI workflows. Path traversal inputs (e.g. --path "../etc/passwd") are rejected before any filesystem write. Parallel /memory:save calls for the same packet are serialized by an advisory lock on description.json and graph-metadata.json.

Scripts and Validation

Spec Management Scripts (in .opencode/skills/system-spec-kit/scripts/spec/):

create.sh - Create spec folders with level-appropriate templates. Use --phase for parent + child
validate.sh - Run 20 validation rules. Use --recursive for phase folders
upgrade-level.sh - Upgrade a spec folder to a higher level by injecting new sections
recommend-level.sh - Analyze scope and risk to recommend the right documentation level
calculate-completeness.sh - Calculate spec folder completeness as a percentage
check-completion.sh - Verify all completion criteria are met
check-placeholders.sh - Find remaining [PLACEHOLDER] values after level upgrade

Memory Scripts (in .opencode/skills/system-spec-kit/scripts/memory/):

generate-context.ts - Primary workflow for updating packet continuity and supporting generated context artifacts
backfill-frontmatter.ts - Add missing frontmatter to existing generated context artifacts and indexed spec docs
reindex-embeddings.ts - Rebuild embedding vectors for stored memories
cleanup-orphaned-vectors.ts - Remove vector entries with no matching memory
rebuild-auto-entities.ts - Regenerate auto-extracted entity catalog
validate-memory-quality.ts - Run quality checks on stored memory content

TypeScript sources compile to .opencode/skills/system-spec-kit/scripts/dist/. The runtime entry point for memory saves is .opencode/skills/system-spec-kit/scripts/dist/memory/generate-context.js.

Gate System

3 mandatory gates run before any file change. Every request passes through the same sequence.

  User message arrives
         │
         ▼
  ┌─────────────────────────────────────────────┐
  │  Gate 1: Understanding (SOFT BLOCK)         │
  │  memory_match_triggers() surfaces context   │
  │  Classify intent: Research / Implementation │
  │  confidence >= 0.70, uncertainty <= 0.35     │
  └──────────────────┬──────────────────────────┘
                     │
                     ▼
  ┌─────────────────────────────────────────────┐
  │  Gate 2: Skill Routing (REQUIRED)           │
  │  advisor_recommend recommends skill         │
  │  confidence >= 0.8 ─► MUST load skill        │
  └──────────────────┬──────────────────────────┘
                     │
                     ▼
  ┌─────────────────────────────────────────────┐
  │  Gate 3: Spec Folder (HARD BLOCK)           │
  │  Only if file modification detected           │
  │  A) Existing  B) New  C) Update             │
  │  D) Skip      E) Phase folder               │
  └──────────────────┬──────────────────────────┘
                     │
                     ▼
              EXECUTION
                     │
                     ▼
  ┌─────────────────────────────────────────────┐
  │  Post-Rules                                 │
  │  Memory Save ─ must use generate-context.js │
  │  Completion ─ verify checklist.md items     │
  └─────────────────────────────────────────────┘

Analysis Lenses - applied silently on every request:

CLARITY - Is this the simplest solution? Are abstractions earned?
SYSTEMS - What does this touch? What are the side effects?
BIAS - Is the user solving a symptom? Is the framing correct?
SUSTAINABILITY - Will future developers understand this?
VALUE - Does this change behavior or just refactor?
SCOPE - Does solution complexity match problem size?

For the full spec folder workflow, Level contract template architecture, gate definitions and anti-pattern detection rules, see the → Spec Kit README and → AGENTS.md.

🧠 Memory Engine

The Memory Engine is a local-first cognitive memory system built as an MCP server. generate-context.js updates canonical packet continuity and may emit supporting generated context artifacts inside the spec folder. Canonical continuity lives in the spec packet itself: use /speckit:resume as the recovery surface, then rebuild context in this order: handover.md -> _memory.continuity -> canonical spec docs. The MCP server indexes those packet-local sources with vector embeddings, BM25 and FTS5 full-text search. memory_match_triggers() can still surface relevant prior context automatically when deeper retrieval is needed.

/memory:save refreshes packet metadata on every invocation. session_resume binds args.sessionId to transport caller context by default. Set MCP_SESSION_RESUME_AUTH_MODE=permissive for rollout canaries. Copilot, Claude and Gemini all share the same compact-cache provenance path.

The memory engine works with session lifecycle surfaces and hybrid retrieval. Structural code indexing now lives in the standalone system-code-graph skill and MCP server.

Expired ephemeral rows are cleaned by a retention sweep on startup and hourly by default. Use memory_retention_sweep for manual or dry-run cleanup. The handler is defined at memory-retention-sweep.ts, with SPECKIT_RETENTION_SWEEP and SPECKIT_RETENTION_SWEEP_INTERVAL_MS controlling the background interval.

The full MCP API reference is in the MCP Server README.

Layered MCP Surface

The mk-spec-memory tools are organized into a layered architecture. Code graph and skill-advisor tools moved to standalone MCP servers, so this table covers memory-owned tools only:

Layer	Name	Tools	Token Budget	Purpose
L1	Orchestration	3	2,000	Unified context, resume and bootstrap entry points
L2	Core	3	1,500	Search, trigger matching, save
L3	Discovery	4	800	List, stats, health checks and session readiness
L4	Mutation	5	500	Delete, update, validate, bulk cleanup, retention sweep
L5	Lifecycle	4	600	Checkpoints and lifecycle state
L6	Analysis	7	1,200	Causal graph (link/stats/drift_why), epistemic baselines, evaluations, dashboards
L7	Maintenance	5	1,000	Memory index scans, async ingest and learning history
L8	Moved Surfaces	0	-	Code graph lives in `mk_code_index`. Advisor and skill graph live in `mk_skill_advisor`
L9	Coverage Graph	4	700	Deep-loop coverage graph operations
L9	Council Graph	4	700	AI Council graph operations
	Total	39	~10,180

Lower layers load only when needed. L1 is always available. L2 loads for any search. L3-L7 load based on the specific command being used.

Hybrid Search

Every search checks five core channels at once, with CocoIndex available as a semantic code search bridge:

Vector - Semantic similarity via embeddings. Finds related content when words differ.
FTS5 - Full-text search on exact words and phrases.
BM25 - Keyword relevance scoring.
Causal Graph - Follows cause-and-effect links between memories.
Degree - Scores by graph connectivity, weighted by edge type.

Reciprocal Rank Fusion (RRF) combines results across channels so memories scoring well in multiple channels rise to the top. Graph-first routing dispatches structural queries to the standalone Code Graph first, then CocoIndex for semantic code discovery, then the memory pipeline. A 3-tier FTS fallback activates when graph and semantic channels miss: FTS5 full-text, BM25 keyword scoring, then Grep/Glob filesystem search. The system truncates weak results and ensures every active channel is represented.

Search Pipeline

Every search passes through 4 stages:

Candidate generation - Parallel retrieval from the active channels plus constitutional injection where applicable.
Fusion - RRF-based scoring with post-fusion signals such as co-activation, FSRS decay, interference control, intent weights and graph/session boosts when enabled.
Rerank - Cross-encoder reranking with chunk reassembly, a minimum Stage 3 gate of 4 candidates and compatibility-only length-penalty wiring that resolves to a neutral 1.0 multiplier. getRerankerStatus() exposes latency plus cache hits, misses, stale hits and evictions. If the reranker is unavailable, Stage 2 order is preserved with degraded metadata.
Filtering - State/quality filtering, confidence annotation, token-budget enforcement and final response shaping without mutating post-rerank scores.

Query Intelligence

Complexity routing - Simple (2 channels), moderate (4), complex (all 5)
Intent classification - 7 public types (add_feature, fix_bug, refactor, security_audit, understand, find_spec, find_decision) plus an internal continuity profile for resume-oriented retrieval (semantic 0.52, keyword 0.18, recency 0.07, graph 0.23. Stage 3 MMR lambda 0.65)
Query decomposition - Multi-topic queries split into sub-queries, expanded with related terms
Context pressure - Downgrades search mode at 60% and 80% window usage
Fallback strategies - LLM reformulation or HyDE for low-confidence searches

Four response modes: quick (top answer only), focused (one-topic), deep (full evidence trails), resume (state summary + next-steps).

Memory Lifecycle

Memories fade using FSRS (Free Spaced Repetition Scheduler). Decay speed varies by content type and importance tier. Critical decisions never fade. Temporary debugging notes fade within days.

Cold-start boost - Fresh memories (under 48h) receive a temporary scoring lift
Interference penalty - Suppresses near-duplicate clusters
Auto-promotion - Memories earn higher tiers through positive validation
Negative feedback - 30-day decay prevents permanent blacklisting

Four active cognitive states drive normal retrieval weighting: HOT >> WARM >> COLD >> DORMANT.

Causal Graph

Six relationship types: caused, enabled, supersedes, contradicts, derived_from, supports

Typed traversal - Prioritizes connection types based on query intent
Community detection - Louvain clustering with neighbor boosting
Co-activation spreading - Fan-effect dampening prevents hub bias
Temporal contiguity - Same-session grouping
Graph momentum - Trending knowledge surfaces higher
LLM backfill - Background discovery of missed causal links

Trust Badges on Search Results

Every search result ships with a small trustBadges block that tells you how reliable the hit is at a glance. The badges are display-only, they read existing causal links and don't add new storage:

Badge	What it tells you
`confidence`	How strong the strongest causal link to this result is
`extractionAge`	How long ago the supporting evidence was extracted
`lastAccessAge`	How recently anything in the chain was used
`orphan`	True when nothing else in the graph points at this result
`weightHistoryChanged`	True when the underlying edge weight has been re-tuned

If the database is unreachable the formatter quietly skips badges instead of failing. Caller-provided badges pass through untouched. Every response profile (quick, research, resume) keeps the badges on the top result and the result list.

Save Intelligence

When you save new knowledge, Prediction Error gating compares it against existing memories and picks one of four outcomes:

CREATE - No similar memory exists. Stored as new knowledge.
REINFORCE - Similar exists, new one adds value. Both kept, old one boosted.
UPDATE - Similar exists, new one is better. Old version replaced.
SUPERSEDE - New knowledge contradicts the old. Old one demoted to deprecated.

Additional save-time processing:

Semantic sufficiency gating - Rejects content too thin to be useful
Verify-fix-verify - Auto-fixes quality issues before storing
Content normalization - Strips formatting clutter for cleaner embeddings
Auto-entity extraction - Spots tool/project/concept names for cross-linking
SHA-256 deduplication - Skips unchanged files instantly
Correction tracking - Records how knowledge evolves across versions

Session Awareness

Working memory - Tracks current session findings with attention decay
Session deduplication - Suppresses already-seen results in follow-up queries
Context pressure - Downgrades search mode as the context window fills

Quality Gates

Three layered checks before storage:

Structure gate - Format, headings, metadata validation
Semantic sufficiency - Enough real content to be useful
Duplicate detection - Triggers Prediction Error arbitration if similar content exists

Preview all checks without saving using dryRun: true. Learned relevance feedback boosts helpful results with safeguards against noise. Two-tier explainability shows plain-language reasons or exact channel contributions.

Retrieval Enhancements

Constitutional injection - Always-surfaced rules appear without asking
Hierarchy awareness - Searches parent and sibling spec folders
Entity linking - Connects memories referencing the same concepts
ANCHOR retrieval - Per-section indexing (~93% token savings)
Auto-surfacing - Triggers on tool use and context compression events
Provenance traces - Shows exactly how each result was found

Indexing and Infrastructure

Real-time watching - Filesystem monitoring via chokidar
Incremental indexing - Content hashes skip unchanged files
Embedding retry - Background worker retries failed embeddings
Lexical fallback - Text-searchable when embedding services are down
Atomic writes - Crash-safe with pending-file recovery on startup

Evaluation

12-metric computation - MRR, NDCG, MAP and more
Ground truth corpus - 110 test questions with known correct answers
Ablation studies - Per-channel quality impact measurement
Offline scoring checks - Test ranking changes before deployment

Embedding Providers

The embedder layer is pluggable. Swap defaults via env vars without touching code. Canonical narrative: embedder-pluggability.md.

Ollama (nomic-embed-text-v1.5) - Default since 2026-05-19 (ADR-013/014). Free, local, 768d retrieval-tuned. Pull once with ollama pull nomic-embed-text:v1.5. The cascade falls back to jina-embeddings-v3 (1024d Q4_K_M) when nomic isn't pulled.
HuggingFace Local - Fallback when the Ollama probe fails. Free, local, 768d q8 ONNX.
Voyage AI - Cloud opt-in. Set VOYAGE_API_KEY. 1024d. Gated by egress guard.
OpenAI - Cloud opt-in. Set OPENAI_API_KEY. 1536d.

🔍 CocoIndex + Code Graph

The framework uses two different code-understanding systems on purpose. CocoIndex handles semantic discovery, so the assistant can answer "find code that does X" or "how is Y implemented?" without knowing exact symbols first. The Code Graph handles structural expansion, so the assistant can answer questions like "what calls this?", "what imports this?" or "what breaks if we change it?" using an indexed relationship graph.

The intended routing order is graph-first: the code graph resolves structural queries first, CocoIndex finds semantic candidates when structural resolution misses and Memory supports session decisions and active-task context after the packet-local recovery sources have been checked. A 3-tier FTS fallback escalates automatically when results are weak.

Default Scope (End-User Code Only)

By default, code-graph scans your repo code only. Five .opencode/ folders are excluded so end-user search results stay signal-rich:

.opencode/skills/**
.opencode/agents/**
.opencode/commands/**
<active-spec-folder>/**
.opencode/plugins/**

Maintainers can opt folders back in process-wide with env vars:

SPECKIT_CODE_GRAPH_INDEX_SKILLS=true       # all skills
SPECKIT_CODE_GRAPH_INDEX_SKILLS=sk-x,sk-y  # only listed skills (csv)
SPECKIT_CODE_GRAPH_INDEX_AGENTS=true
SPECKIT_CODE_GRAPH_INDEX_COMMANDS=true
SPECKIT_CODE_GRAPH_INDEX_SPECS=true
SPECKIT_CODE_GRAPH_INDEX_PLUGINS=true
SPECKIT_CODE_GRAPH_DB_DIR=/path/to/code-graph-db # optional DB-dir override

Per-call args override env vars when provided. Env vars apply only for fields omitted from the scan call:

code_graph_scan({
  includeSkills: ['sk-code-review', 'sk-doc'], // granular: only these skills
  includeAgents: true,                         // all .opencode/agents/**
})

Existing v1 scans trigger a blocked read with requiredAction:"code_graph_scan" until you re-run the scan. See system-code-graph README §8 SCAN SCOPE for the full scan-scope rules and precedence details.

Our CocoIndex is forked. The Python wrapper that powers semantic search is a soft-fork at version 0.2.3+spec-kit-fork.0.2.0, vendored alongside the skill so it ships with this repo. The Rust engine underneath stays on PyPI. The fork adds four things the upstream wrapper doesn't: duplicate suppression so mirror copies of the same file don't crowd results, canonical path identity per chunk (so dedup works across symlinks), a path-class taxonomy that nudges "find me the implementation of X" toward implementation files first and ranking telemetry that surfaces why each result ranked where it did. Responses from the MCP tool or ccc search CLI carry seven fork-specific fields, source_realpath, content_hash, path_class, dedupedAliases, uniqueResultCount, raw_score, rankingSignals, that vanilla cocoindex output does not include. Schema, attribution and per-release patch list all live under .opencode/skills/mcp-coco-index/.

How the Code Graph Works

The Code Graph is a SQLite-backed structural index owned by .opencode/skills/system-code-graph/ and registered as the standalone mk_code_index MCP server. MCP callers use the mcp__mk_code_index__* namespace. Runtime config parity is mixed across clients during the rename transition, so docs use the canonical mk_code_index surface while follow-on config work handles remaining legacy bindings.

Startup injection. When the MCP server starts, it initializes the code-graph.sqlite database, runs a non-blocking startup scan and activates a file watcher. Three supported runtimes (Claude Code, Gemini CLI, Codex CLI) transport the same compact startup shared-payload through their runtime hooks (session-prime.ts on Claude/Gemini, session-start.ts on Codex). Codex requires [features].codex_hooks = true opt-in for native hooks. Copilot CLI uses file-based custom instructions with a limited cache and writer path. It refreshes a managed block but does not inject model-visible context during the precompute phase. The payload includes a one-line health summary, graphQualitySummary (detector provenance + edge-enrichment summary) and the sharedPayloadTransport envelope so downstream consumers receive identical structural context regardless of runtime. session_bootstrap() remains available as a manual recovery surface when native hooks are disabled.

Auto-indexing. The graph stays current through three mechanisms:

Startup scan - indexes on server boot (async, non-blocking)
File watcher - Chokidar monitors spec and source folders with a 2-second debounce, reindexing changed files in real time
Lazy refresh - code_graph_query calls ensureCodeGraphReady() which detects staleness and triggers a bounded inline refresh before returning results

The indexer uses tree-sitter to parse source files and extract functions, classes, imports and call relationships. It tracks per-file content hashes to skip unchanged files, making incremental scans fast.

Readiness & Response Contract

code_graph_query and code_graph_context share a readiness-aware response contract. When the graph is fresh enough, both return status: "ok" with resolved results plus a readiness / canonicalReadiness / trustState block. When readiness requires a full scan that cannot run inline, both return an explicit status: "blocked" payload naming requiredAction: "code_graph_scan", blockReason: "full_scan_required", degraded and graphAnswersOmitted instead of silently returning empty results. Callers should run code_graph_scan before retrying.

Success payloads of code_graph_context carry structured data.metadata.partialOutput (isPartial, reasons, omittedSections, omittedAnchors, truncatedText) and an explicit deadlineMs field so callers can distinguish a complete answer from one trimmed by deadline or budget pressure. code_graph_status exposes graphQualitySummary (detector provenance + edge-enrichment confidence). CALLS queries on ambiguous subjects (e.g. handle*) prefer callable implementation nodes over wrapper-shadow candidates and return ambiguity / selected-candidate metadata so callers can audit the choice.

Edge Explanations and Better Blast Radius

Relationship answers from code_graph_query include short reason and step fields alongside confidence and provenance, so you can see why an edge is there instead of just that it exists. code_graph_context carries those same fields through to structured edges and text briefs.

blast_radius keeps the prior payload (affected files, source files, hot files, multi-file union, depth) and adds:

depthGroups: affected nodes bucketed by how far they sit from the change
riskLevel: high when the subject is ambiguous or fans out to more than 10 things at depth one, medium for 4–10, low otherwise
minConfidence filter, drop traversals below a confidence floor
ambiguityCandidates: list of plausible matches when the subject can't be resolved
failureFallback: structured info instead of a bare error string when resolution can't continue

All of this rides inside the existing code_edges.metadata JSON blob, no SQLite schema changes.

`detect_changes`: Preflight Impact Check

detect_changes is a read-only Code Graph tool that takes a diff and tells you which symbols and files it touches. It runs alongside code_graph_scan, code_graph_query, code_graph_status and code_graph_context.

You hand it { diff: string, rootDir?: string }. It walks each diff hunk, overlaps the line ranges with stored symbols and returns { status, affectedSymbols[], affectedFiles[], blockedReason?, timestamp, readiness }.

Safety is non-negotiable: the tool checks the graph is fresh before parsing the diff. If the graph is stale or unavailable, it returns status: 'blocked' immediately, so an out-of-date index never produces a false "nothing impacted" answer. Inline indexing is explicitly disabled here, so the read-only contract is enforced.

Under the hood the scan runner is split into four declared phases (find-candidates → parse-candidates → finalize → emit-metrics) for clearer instrumentation, with no SQLite schema changes.

The code graph runtime has its own feature catalog and operator playbook under system-code-graph/feature_catalog and system-code-graph/manual_testing_playbook. They document runtime features and manual scenarios for freshness, scan/verify/status, detect_changes, context retrieval, coverage graph, CCC and doctor-code-graph behavior.

What Each System Does

System	Best for	Primary surface
CocoIndex	Concept search, similar implementations, unfamiliar modules	`mcp__cocoindex_code__search`
Code Graph	Callers, imports, symbol outlines, impact analysis, neighborhood expansion	`mcp__mk_code_index__code_graph_`, `mcp__mk_code_index__detect_changes`, `mcp__mk_code_index__ccc_`
Session bridge tools	Session bootstrap, resume and health checks around graph availability	`session_bootstrap`, `session_resume`, `session_health`
CCC utilities	CocoIndex availability, reindexing, result feedback	`ccc_status`, `ccc_reindex`, `ccc_feedback`

How Query Routing Works (Graph-First)

The default routing order is: Code Graph (structural) -> CocoIndex (semantic code) -> Memory (session/decision context). This graph-first approach tries structural resolution before semantic similarity, with a 3-tier FTS fallback when earlier stages miss.

Use the Code Graph first for structural questions: callers, callees, imports, hierarchy, file outlines and reverse impact.
Use CocoIndex for semantic and intent-based questions: "find code that validates memory quality", "show similar routing patterns", "where is the logic for X?"
Use session tools when recovering or checking environment readiness, but treat /speckit:resume as the canonical operator-facing recovery surface.
Rebuild task continuity in this order: handover.md -> _memory.continuity -> canonical spec docs.
Use Memory after those packet-local sources when the question is about prior decisions, spec history, handovers or task continuity that still needs deeper retrieval.

Why It Matters

This split avoids forcing one search system to do everything poorly. Semantic search is good at resemblance. Structural search is good at relationships. Keeping both lets the framework move from "this code looks relevant" to "this is how it connects" without collapsing those concerns into a single noisy result set.

For the full code-graph tool and architecture reference, see the system-code-graph skill and system-code-graph README. Shared memory and lifecycle details stay in .opencode/skills/system-spec-kit/README.md.

🎯 Skill Advisor

The Skill Advisor matches what you type to the right skill before any tool runs. It is now a standalone MCP server named mk_skill_advisor, packaged under .opencode/skills/system-skill-advisor/mcp_server/. The server registers nine tools: eight on the public surface (four advisor_* tools for routing, freshness, rebuild and validation, plus four skill_graph_* tools for scan, query, status and graph validation), plus one internal propagation tool. A small Python compatibility shim still works as a fallback when the native path is unavailable.

How It Works

  YOU TYPE: "use chrome-devtools to inspect a page"
                      │
                      ▼
           ┌──────────────────────┐
      1.   │  NORMALIZE           │  Clean up the prompt, never store
           │                      │  the raw text
           └──────────┬───────────┘
                      ▼
           ┌──────────────────────┐
      2.   │  5-LANE FUSION       │  Explicit author signals 0.42
           │                      │  Lexical match 0.28
           │                      │  Causal graph 0.13
           │                      │  Derived hints 0.12
           │                      │  Semantic evidence 0.05
           └──────────┬───────────┘
                      ▼
      ┌───────────────────────────────┐
      │  3. FRESHNESS + LIFECYCLE     │  Is each candidate still alive?
      │                               │  live / stale / absent / archived
      │  Reads SQLite skill graph     │  with redirect metadata
      │  + generated metadata         │  Falls open on errors
      └───────────────┬───────────────┘
                      ▼
           ┌──────────────────────┐
      4.   │  VALIDATE + FILTER   │  Apply confidence + uncertainty
           │                      │  thresholds, cache the trust
           │                      │  envelope
           └──────────┬───────────┘
                      ▼
           ┌──────────────────────┐
      5.   │  RENDER              │  Either a one-line hook brief
           │                      │  or a JSON recommendation list
           └──────────┬───────────┘
                      ▼
                RESULT:
           advisor_recommend -> list of skill recommendations
           hook adapter -> "Advisor: live, use ..."
           shim fallback -> legacy JSON

Native Package Layout

.opencode/skills/system-skill-advisor/mcp_server/
├── bench/      benchmarks
├── compat/     stable compatibility entry for runtimes
├── handlers/   the nine MCP tool handlers (8 public + 1 internal)
├── lib/        scorer, normalizer, freshness, cache
├── schemas/    JSON + Zod schemas
├── tests/      test suite
└── tools/      tool registration

Tool	What it does
`advisor_recommend`	Recommends skills for a prompt with lane breakdown, lifecycle redirects and a freshness trust signal. Returns the workspace root and the effective thresholds it used.
`advisor_rebuild`	Rebuilds the advisor skill graph when `advisor_status` reports stale, absent or unavailable state. `force:true` rebuilds even when live.
`advisor_status`	Reports freshness, generation, trust state, lane weights, skill count, last scan time and background daemon status.
`advisor_validate`	Runs measurement slices: corpus accuracy, holdout, parity, safety, latency. Surfaces the workspace root, effective thresholds, threshold semantics (aggregate vs runtime) and prompt-safe outcome counts (accepted / corrected / ignored).
`skill_graph_scan`	Indexes skill metadata into the advisor-owned skill graph surface.
`skill_graph_query`	Queries skill graph relationships such as dependencies, families, hubs, conflicts and subgraphs.
`skill_graph_status`	Reports graph counts, families, categories, staleness, validation and database status.
`skill_graph_validate`	Validates schema drift, broken edges, reciprocal symmetry and dependency-cycle issues.

How Runtimes Talk To It

Claude Code, Gemini CLI, Codex CLI: call prompt-time hook adapters under .opencode/skills/system-spec-kit/mcp_server/hooks/. Codex CLI requires [features].codex_hooks = true opt-in for native hooks. Copilot CLI uses file-based custom instructions for the startup-surface path only.
OpenCode: uses .opencode/plugins/spec-kit-skill-advisor.js with spec-kit-skill-advisor-bridge.mjs, which imports the stable compat entry under .opencode/skills/system-skill-advisor/mcp_server/compat/index.ts.
Codex cold starts: the Codex prompt hook emits a prompt-safe stale advisory plus {"stale":true,"reason":"timeout-fallback"} when startup context times out. The smoke helper lives at freshness-smoke-check.ts.
Disable everywhere: set SPECKIT_SKILL_ADVISOR_HOOK_DISABLED=1 to turn off all prompt-time advisor surfaces.
Threshold contract at the prompt: confidence ≥ 0.8 and uncertainty ≤ 0.35 by default.

Validation and Testing

advisor_validate({"skillSlug":null}) returns measured corpus / holdout / parity / safety / latency slices plus prompt-safe outcome totals.
Python compatibility regression harness: checked-in dataset and pass/fail totals are reported by skill_advisor_regression.py.
Native package: 23 advisor test files, 167 tests.
Manual testing playbook: 42 scenario files spanning native MCP tools, runtime hooks, the OpenCode plugin, compatibility controls, auto-indexing, lifecycle routing, scorer fusion and operator-state edge cases.
Hook diagnostics write to bounded JSONL sinks under the temp metrics root. The validator reads those sinks back across processes.

Affordance Evidence

Callers can pass structured tool and resource hints, skillId, name, triggers[], category, dependsOn[], enhances[], siblings[], prerequisiteFor[], conflictsWith[], as affordance evidence. A normalizer strips URLs, emails, token-shaped fragments, control characters and instruction-shaped strings before the scorer sees anything. Free-form description text is ignored on purpose. Sanitized triggers feed the existing derived-hints lane at reduced weight. Normalized relations become temporary edges in the existing causal-graph lane reusing the standard relation multipliers (depends_on, enhances, siblings, prerequisite_for, conflicts_with). No new scoring lane, no new entity kind, no raw matched phrases in recommendation payloads, evidence labels stay as stable affordance:<skillId>:<index> identifiers.

For details, see the Skill Advisor README.

🎯 Skills Library

22 skills in .opencode/skills/, loaded on demand when Gate 2 matches a task (confidence >= 0.8 means the skill must be loaded).

DOCUMENTATION

system-spec-kit

Mandatory orchestrator for all file modifications - activates automatically for any code file change
Creates numbered spec folders with manifest templates rendered through Level contracts across 4 levels (1-3+)
Integrates the 39-tool memory surface with constitutional-tier support, session bootstrap and hybrid 5-channel retrieval
Manages the manifest template source, 20 validation rules, the spec-kit script suite and the feature-catalog / testing-playbook documentation surfaces

sk-doc

Unified markdown specialist with DQI quality scoring (Structure 40%, Content 35%, Style 25%)
HVR v0.210 compliance checking and component creation workflows (skills, agents, commands)
Handles README templates, frontmatter validation, feature catalog authoring, install guide generation

CODE WORKFLOW

sk-code

Multi-stack coding standards, references and assets: surface-aware patterns, checklists and verification recipes loaded per stack
WEBFLOW stack: Webflow / vanilla HTML/CSS/JS animation projects (motion.dev, GSAP, Lenis, HLS, Swiper, FilePond), CDN deployment, Lighthouse/TBT/INP targets, browser verification
OPENCODE stack: .opencode/ system code across JavaScript/CommonJS, TypeScript, Python, Shell, JSON/JSONC, MCP server code, agents, commands, skill files
Smart-routing internals auto-detect the active stack from CWD/target paths and library markers. Unsupported stacks (Go, React/Next.js, generic Node.js, React Native, Swift) trigger a disambiguation question
3 mandatory phases: implementation → testing/debugging → verification

sk-code-review

Stack-agnostic code review baseline using sk-code surface evidence where applicable
Baseline always runs first: security checklist, correctness checklist, SOLID checklist, threat model
Security and correctness minimums are mandatory and NEVER relaxed by surface-specific evidence. P0/P1/P2 findings.

sk-git

Git workflow orchestrator coordinating 3 sub-skills
git-worktree: workspace isolation, branch creation, parallel development
git-commit: conventional commit format, staged change analysis, scope detection
git-finish: PR creation via gh pr create, branch cleanup, integration workflows

deep-research

Autonomous research investigation system with iterative LEAF cycles
Fresh context per iteration, externalized JSONL state, 3-signal convergence detection (Rolling Average + MAD Noise Floor + Coverage/Age)
Semantic coverage graph with 7 relation types, question coverage tracking, sourceDiversity and evidenceDepth guards
Progressive synthesis, negative knowledge preservation, quality guards (source diversity, focus alignment, weak-source checks)
Fail-closed corruption handling, graph convergence fallback scoring, terminal stop metadata parsing
Lifecycle modes: new, resume, restart. Dispatched by /deep:start-research-loop command

deep-review

Autonomous code quality auditing system with iterative LEAF cycles
P0/P1/P2 severity-weighted findings across 4 dimensions (Correctness, Security, Traceability, Maintainability)
3-signal convergence model, P0 override blocks stop, adversarial self-check (Hunter/Skeptic/Referee)
Binary quality gates (evidence, scope, coverage), graph-aware legal-stop checks, semantic coverage graph
9-section review report with PASS/CONDITIONAL/FAIL verdict
Fail-closed corruption, claim-adjudication finalSeverity, stale STOP veto auto-clearing
Lifecycle modes: new, resume, restart. Dispatched by /deep:start-review-loop command

deep-loop-runtime

Shared runtime infrastructure for deep-review + deep-research loop workflows (post-arc-118)
Owns executor config, state safety, scoring, fallback routing, coverage-graph scripts, and storage/deep-loop-graph.sqlite

CROSS-AI CLI

These skills let you run cross-CLI agent teams from any starting CLI. Whichever assistant you're talking to (Claude Code, Codex, Copilot, Gemini, OpenCode, raw shell), it can dispatch the other AI CLIs as specialist sub-tools, each one a one-shot non-interactive call that streams structured output back to the caller. The conducting AI stays in charge. The dispatched CLI handles the part it's best at and returns. Use this to compose a Gemini web search + Codex implementation + Claude review pipeline from inside any one of them.

Self-invocation guard: every skill refuses to call itself. A Claude Code session never dispatches cli-claude-code, an OpenCode session never dispatches cli-opencode, etc. Cross-AI delegation only, no cycles.

cli-gemini

Gemini CLI orchestrator. Use it for real-time web search via Google Search grounding (no other CLI skill has this) and for analyzing very large codebases (1M+ token context).
Single model: gemini-3.1-pro-preview.

cli-codex

OpenAI Codex CLI orchestrator. Use it for code generation, diff-aware review (/review), web browsing (--search) and screenshot analysis (--image). Supports session resume/fork, agent profiles and cost control via --max-budget-usd.
Default model: gpt-5.5 at medium reasoning, fast service tier. gpt-5.3-codex and other GPT-5.x variants available via override.

cli-claude-code

Claude Code CLI orchestrator. Use it for extended thinking (chain-of-thought), surgical diff-based edits and JSON-schema-validated structured output. Ships with 9 built-in agents and session continuity.
Three models: claude-opus-4-6 (deep reasoning), claude-sonnet-4-6 (default, balanced), claude-haiku-4-5 (fast/cheap).

cli-opencode

OpenCode CLI orchestrator. Use it when the dispatched task needs the project's full plugin / skill / MCP / Spec Kit Memory runtime, a one-shot opencode run boots every plugin in opencode.json, every skill under .opencode/skills/, every MCP server and the memory database. Also handles parallel detached sessions (--share --port N for ablation suites, worker farms) and cross-repo dispatch (--dir <path>).
Three providers: github-copilot (default, with gpt-5.4 default + claude-sonnet-4.6 alternative), opencode-go (DeepSeek + GLM/Kimi/Qwen via gateway), deepseek (direct DeepSeek API).

cli-devin

Devin CLI orchestrator. Use it to dispatch Cognition AI's autonomous-agent binary devin from any sibling CLI session, with the family's only local-to-cloud handoff (the live session can migrate to a Cognition cloud VM that keeps working asynchronously and returns a PR).
Four-model preset: swe-1.6 default for context gathering, tool use and simple-to-medium well-defined code tasks. deepseek-v4 primary for complex tasks. glm-5.1 and kimi-k2.6 as complex-task fallbacks (agentic / large-context shape respectively).

MCP INTEGRATION

system-code-graph

First-class code-graph subsystem at .opencode/skills/system-code-graph/
Owns structural AST indexing, SQLite graph storage, readiness contracts, detect_changes and CocoIndex bridge helpers
Current MCP server name: mk_code_index. Client namespace: mcp__mk_code_index__*

system-skill-advisor

Standalone Gate 2 skill routing package at .opencode/skills/system-skill-advisor/
Exposes advisor tools plus skill_graph_* structural routing tools through the mk_skill_advisor MCP server
Keeps advisor and skill-graph storage out of the memory server

mcp-code-mode

MCP orchestration engine providing access to 200+ external tools through a single TypeScript interface
Reduces context overhead by 98.7% by loading external tool schemas on demand
Progressive tool loading - zero upfront cost, tools load on first use. Type-safe with autocomplete.

mcp-coco-index

Semantic code search via vector embeddings (sbert/nomic-ai/CodeRankEmbed 768d code-tuned default, MIT, MPS auto-detect on Apple Silicon — promoted 2026-05-19). Stage 2 cross-encoder rerank via Qwen/Qwen3-Reranker-0.6B (Apache-2.0 default since 2026-05-20 per ADR-027; jina-reranker-v3 kept as opt-in fallback). Alternative embedders + rerankers registered at embedders/registered_embedders.py; swap runbook in INSTALL_GUIDE §4)
Natural-language discovery of code patterns and implementations across 28+ languages
Two access modes: CLI (ccc) for direct terminal use, MCP server for AI agent integration

mcp-chrome-devtools

Chrome DevTools orchestrator with intelligent 2-mode routing
CLI mode (bdg) prioritized for speed - runs in terminal, supports Unix pipes, composable in CI/CD
MCP mode as fallback for multi-tool integration scenarios

OTHER

sk-prompt

Prompt engineering specialist auto-selecting from 7 proven frameworks (RCAF, COSTAR, RACE, CIDI, TIDD-EC, CRISPE, CRAFT)
DEPTH thinking methodology with 3-10 iteration rounds of progressive refinement
CLEAR quality scoring: Clarity, Logic, Expression, Reliability (40+/50 pass threshold)

deep-ai-council

Multi-seat planning council dispatching diverse AI reasoning seats for strategic decisions
Cross-seat critique and convergence checks produce evidence-backed recommendations
Packet-local artifact persistence via ai-council/** output directory
Planning-only scope. Agent counterpart listed in the Agent Network section below

sk-prompt-small-model

Sentinel skill for small-model optimization patterns. Discovery anchor only — routes operators to executor-owned pattern files instead of hosting logic
Active dispatch matrix: SWE-1.6, DeepSeek-v4-pro, Kimi-k2.6, Qwen3.6, GLM-5.1 across cli-devin + cli-opencode (DeepSeek API direct + opencode-go pool). Optional stubs for Claude Haiku and Gemini Flash
references/pattern-index.md maps each pattern (context budget, output verification, permissions matrix, quota fallback, model profiles, tool scoring) to its canonical executor-owned location
Pool-aware quota fallback (different-pool target only; no same-pool retries). Frontier models (Opus, Sonnet, gpt-5.5) explicitly out of scope. Adopting Haiku/Gemini Flash is metadata-first via sk-prompt/assets/model-profiles.json

deep-agent-improvement

Evaluator-first 5-dimension scoring: structural, ruleCoherence, integration, outputQuality, systemFitness — with integration scanner that discovers every surface an agent touches (canonical, mirrors, commands, YAML, skills)
Dynamic profile generator derives the scoring rubric from each agent's own rules; no hardcoded profiles needed
Proposal-first: candidates written to packet-local runtime; canonical target untouched until guarded promotion (scoring + benchmark + repeatability + operator approval gates, with rollback support)
Deterministic scoring (regex/string/file-existence; no LLM-as-judge) and plateau detection (3+ identical scores triggers stop)

🤖 Agent Network

11 custom specialist agents. Defined in .opencode/agents/ (source of truth), mirrored for Claude Code (.claude/agents/), Codex CLI (.codex/agents/) and Gemini CLI (.gemini/agents/) runtime surfaces. OpenCode and Copilot CLI use runtime-specific MCP and startup integration rather than a dedicated agent mirror.

Orchestrate

Senior task commander with full authority over decomposition, delegation and quality evaluation
Merges sub-agent outputs into one unified response with conflict resolution
Read-only permissions - delegates implementation to other agents
Single-hop delegation only (depth 2 max) to prevent runaway chains

Code

Surface-aware code implementation specialist (write-capable LEAF, mode: subagent, task: deny)
Delegates code surface detection to sk-code. The agent body itself stays stack-agnostic and reads sk-code's emitted surface evidence at dispatch time
7 dispatch modes: full implementation / surgical fix / refactor only / test add / scaffold new file / rename-move / dependency bump
5-dimension acceptance rubric (100 pts total): Correctness 30, Scope-Adherence 20, Verification-Evidence 20, Stack-Pattern-Compliance 15, Integration 15
Builder → Critic → Verifier adversarial self-check on every completion claim (challenges DONE, opposite axis from @review's Hunter/Skeptic/Referee which challenges findings)
Iron Law: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE FROM THE ACTUAL STACK. LOW confidence strictly blocks DONE
Fail-closed verification, failure returns to orchestrator, no internal retry. BLOCKED-count circuit breaker (3× BLOCKED → orchestrator offers @debug)
Compact RETURN line + structured body with escalation classifier (NONE / UNKNOWN_STACK / SCOPE_CONFLICT / LOW_CONFIDENCE / LOGIC_SYNC / VERIFY_FAIL)
Dispatched ONLY by @orchestrate via convention-floor caller-restriction (description prose + body §0 dispatch gate + orchestrate.md routing entry. Not harness-enforced)

Context

Memory-first retrieval specialist - always checks memory before codebase
Search order: match_triggers → memory_context → memory_search → grep/glob
Returns structured Context Packages combining memory findings with codebase evidence
Uses both CocoIndex semantic search and the 5-channel memory system. Read-only.

Review

Code quality guardian with strict read-only permissions (cannot write or edit any file)
Loads sk-code-review baseline first, then uses sk-code surface evidence for stack-specific standards (whatever surface sk-code detected)
Security and correctness minimums are mandatory and never relaxed by surface-specific evidence
Produces findings-first severity analysis with quality scoring and pattern validation

Debug

Fresh-perspective debugger that receives structured context handoff (not conversation history)
Avoids inherited bias from failed prior attempts - use after 3+ failed debugging tries
Systematic 5-phase methodology: Observe → Analyze → Hypothesize → Validate → Fix
Writes debug-delegation.md with root cause analysis and findings

Markdown

Dedicated LEAF executor for the /create:* command family (agent, sk-skill, feature-catalog, testing-playbook, folder_readme, changelog) plus scoped spec-doc and markdown authoring
Scope-gated by convention-level Phase 0 check. Refuses unscoped writes and nested delegation with canonical REFUSE wording
Loads sk-doc skill on every invocation. Reads the per-command or document-appropriate template before writing
Deterministic 3-state output contract: STATUS=OK PATH=<file> / STATUS=FAIL ERROR=<reason> / STATUS=CANCELLED
DQI score >=75 minimum reported in completion claim. HVR (Human Voice Rules) compliance enforced
Runtime lifecycle docs should stay HVR-concise: link to the scripts runbook and active spec packet instead of duplicating long process-cleanup instructions in every README.

Prompt-Improver

Prompt-escalation specialist for high-stakes external CLI invocations and other sensitive AI prompt work
Selects the best-fit framework from sk-prompt, applies DEPTH at the right energy level and validates the result with CLEAR
Returns a structured prompt package with FRAMEWORK, CLEAR_SCORE, RATIONALE, ENHANCED_PROMPT and ESCALATION_NOTES
Used by the CLI mirror-card pipeline and /prompt agent mode when complexity, compliance or stakeholder spread makes inline prompting too weak

AI Council

Multi-strategy planning architect dispatching diverse AI vantage points and strategy lenses
Seeks distinct reasoning strategies across multiple AIs (cli-codex, cli-gemini, cli-claude-code + native)
Multi-round deliberation before recommending a plan. Planning-only (never modifies files)
5-dimension scoring rubric for strategy quality

Deep Research

Autonomous research agent executing single LEAF (Loop, Externalize, Analyze, Finish) iterations
State externalized via JSONL + strategy.md for pause/resume across sessions
Loop orchestration managed by /deep:start-research-loop command, not this agent
Has permission to write research.md and scratch/ inside spec folders
3-signal convergence model: Rolling Average (0.45), MAD Noise Floor (0.30), Coverage/Age (0.25) with 0.60 threshold
Semantic coverage graph: each iteration emits graphEvents with relation types (ANSWERS, SUPPORTS, CONTRADICTS, SUPERSEDES, DERIVED_FROM, COVERS, CITES)
Graph convergence guards: sourceDiversity (>= 0.4) and evidenceDepth (>= 1.5) block premature STOP
Question coverage tracking computes answerCoverage ratio from ANSWERS edges
Quality guards: source diversity, focus alignment and weak-source checks must pass before STOP
Progressive synthesis: research.md updated incrementally and finalized during synthesis
Negative knowledge: ruled-out directions and dead ends preserved as first-class outputs
Lifecycle modes: new, resume, restart (fork and completed-continue are deferred)
Fail-closed corruption handling: throws structured error before writing derived files when JSONL is corrupt
Graph convergence fallback: scoring uses a numeric fallback when blendedScore is absent

Deep Review

Autonomous code quality auditor using LEAF architecture for single review iterations
Reviews code but NEVER modifies target files (read-only on code)
Loop orchestration managed by /deep:start-review-loop command, not this agent
Produces P0/P1/P2 severity-ranked findings with file:line evidence across 4 review dimensions (Correctness, Security, Traceability, Maintainability)
Severity-weighted convergence: P0 contributes weight 10.0, P1 contributes 5.0, P2 contributes 1.0. Refinements contribute 0.5x those weights
3-signal convergence model: Rolling Average (0.45), MAD Noise Floor (0.30), Dimension Coverage (0.25)
P0 override: any new P0 finding forces at least one more iteration regardless of convergence math
Adversarial self-check on P0 findings: Hunter/Skeptic/Referee triad before admission
Binary quality gates: evidence (file:line backed), scope (stays inside declared target), coverage (all dimensions and cross-reference protocols complete)
Graph-aware legal-stop checks using structural graph signals from graphEvents
Semantic coverage graph with review-specific node types (DIMENSION, FILE, FINDING, EVIDENCE, REMEDIATION) and edge types (COVERS, EVIDENCE_FOR, IN_DIMENSION, CONTRADICTS, RESOLVES, CONFIRMS)
9-section review report with PASS/CONDITIONAL/FAIL verdict (FAIL on P0, CONDITIONAL on P1, PASS otherwise)
Claim-adjudication finalSeverity overrides original severity in the findings registry
Fail-closed corruption handling: reducer refuses to write derived files when JSONL corruption is detected
Lifecycle modes: new, resume, restart with typed JSONL lineage events

Agent Improver

Proposal-only mutator for bounded agent improvement experiments
Reads the target agent's charter, manifest and integration surface, then writes ONE candidate to a packet-local runtime area
Never scores, promotes, benchmarks or edits canonical targets. The /deep:start-agent-improvement-loop command loop handles those.
Loop orchestration: scan integration surfaces, generate dynamic profile, dispatch this agent, score candidate across 5 dimensions (structural, ruleCoherence, integration, outputQuality, systemFitness), reduce state, check stop conditions

⌨️ Commands

24 command entry points across 5 command groups plus root utilities. Each command is a Markdown entry point under .opencode/commands/**/*.md backed by a behavioral execution spec.

SPEC KIT

Plan --intake-only

Standalone intake workflow that publishes spec.md, description.json and graph-metadata.json
Used directly for new packet setup and paired with /speckit:plan or /speckit:complete when folder_state is no-spec, partial-folder, repair-mode or placeholder-upgrade
Modes: :auto, :confirm

Complete

End-to-end workflow: intake/delegate → research → plan → implement → verify → save memory
Smart-detects missing or unhealthy packet state and reuses the shared intake contract from /speckit:plan --intake-only. Healthy folders continue without extra setup prompts
Modes: :auto (fully autonomous), :confirm (pause at each step), :with-research (adds deep research)
After 3 failed implementation attempts, surface diagnostics and let the user dispatch @debug via the Task tool

Plan

Planning-only workflow that authors spec.md, plan.md and tasks.md without implementing
Reuses the shared intake contract from /speckit:plan --intake-only when the packet is no-spec, partial-folder, repair-mode or placeholder-upgrade
Dispatches up to 4 parallel context agents for codebase exploration during planning
Use when you need stakeholder review before coding. Modes: :auto, :confirm

Implement

Executes an existing plan - requires plan.md to already exist
9-step workflow covering task breakdown, implementation, testing and verification
Modes: :auto, :confirm

Resume

Continues a previous session by auto-loading memory from the spec folder
Presents session summary, shows progress against tasks.md
Works after crashes, compactions or new sessions

Spec-first command chains

/speckit:plan --intake-only
  ├─► /speckit:plan -> /speckit:implement
  ├─► /deep:start-research-loop -> /speckit:plan
  └─► /speckit:complete
       └─► reuses the shared intake contract from /speckit:plan --intake-only when folder_state still needs intake

/deep:start-research-loop only enters that chain after a real spec.md exists. It follows spec_check_protocol.md for advisory-lock handling, folder_state classification and bounded generated-fence sync.

MEMORY

Save

Updates packet continuity and supporting generated context artifacts via generate-context.js
AI composes structured JSON with session summary, key decisions and findings
Indexes immediately for future retrieval via memory_save() or memory_index_scan()

Search

Unified retrieval and analysis entry point with intent-aware routing
Supports epistemic baselines, causal graph traversal, ablation studies and dashboards
Routes by intent: add_feature, fix_bug, refactor, security_audit, understand, find_spec, find_decision

Learn

/memory:learn constitutional memory manager for always-surface rules
Constitutional memories carry a 3.0x boost and never decay
Lifecycle operations: create, list, edit, remove, budget

Manage

Database admin: stats (memory counts, index health), health checks, cleanup (orphaned vectors)
Checkpoint management: create, list, restore, delete
Bulk operations and ingestion (start/status/cancel)

CREATE

Skill

Unified skill creation and update workflow
Creates SKILL.md with 8-section structure, README.md, references and assets directories
Registers in skill catalog. Modes: :auto, :confirm

Agent

Scaffolds a new agent definition with proper frontmatter, behavioral rules and tool permissions
Creates source-of-truth file in .opencode/agents/ and mirrors for Claude, Codex, Gemini runtimes
Modes: :auto, :confirm

Readme

Unified README and install guide creation using sk-doc quality standards
Auto-detects folder type, loads appropriate template, validates via DQI scoring
Structure 40%, Content 35%, Style 25%. Modes: :auto, :confirm

Changelog

Auto-detects recent work from spec folder artifacts or git history
Resolves correct component folder, calculates next version number
Generates formatted changelog file matching 370+ existing entries. Modes: :auto, :confirm

Feature Catalog

Creates or updates feature catalog packages with category routing
Generates both technical reference entries and simple-terms companion entries
Validates against the 290-entry catalog structure across 22 categories

Testing Playbook

Creates or updates manual testing playbook packages
Generates scenario files with test steps, expected results and verification evidence fields
Validates against established playbook format

The MCP server also ships explicit stress and matrix execution surfaces. Run npm run stress from mcp_server/ for the dedicated stress_test/ suite, which covers search-quality, memory, skill-advisor, code-graph, session and matrix subsystems. matrix_runners/ provides four per-CLI adapters plus a manifest and meta-runner for the F1-F14 feature matrix across cli-codex, cli-gemini, cli-claude-code and cli-opencode.

DEEP

AI Council

Multi-seat planning and strategy workflow for complex decisions
Produces packet-local ai-council/** artifacts, critique rounds and convergence evidence
Planning-only: never modifies implementation files directly
Modes: :auto, :confirm

Deep Research

Autonomous research loop dispatching deep-research agents iteratively until convergence
Anchors every run to a real spec.md under spec_check_protocol.md, with advisory lock handling, folder_state detection and bounded BEGIN/END GENERATED write-back
Externalized JSONL state enables pause/resume across sessions
Reducer parses terminal synthesis_complete events for authoritative stop metadata
Graph convergence guards block premature STOP when sourceDiversity or evidenceDepth thresholds fail
Lifecycle modes: new, resume, restart with lineage tracking across generations
Modes: :auto, :confirm

Deep Review

Autonomous code review loop dispatching deep-review agents iteratively until convergence
Severity-weighted findings (P0/P1/P2) across 4 dimensions with release readiness verdicts (PASS/CONDITIONAL/FAIL)
Claim-adjudication packets with finalSeverity override, stale STOP veto auto-clearing
Binary quality gates (evidence, scope, coverage) checked after convergence math before allowing stop
Adversarial self-check on P0 findings using Hunter/Skeptic/Referee triad
Lifecycle modes: new, resume, restart with typed JSONL lineage events
Modes: :auto, :confirm

Agent Improvement

Evaluates and improves any agent across 5 integration-aware dimensions with deterministic scoring
Runs a bounded loop: scan integration surfaces, generate dynamic profile, dispatch @deep-agent-improvement, score candidate, reduce state, check stop conditions
Integration scanner discovers all surfaces an agent touches: canonical definition, runtime mirrors, command dispatch, YAML workflows, skill references
Dynamic profiling: derives scoring rubric from any agent's own rules, no hardcoded profiles needed
Proposal-first: candidates written to packet-local runtime areas, canonical target untouched until guarded promotion
Guarded promotion requires passing scoring, benchmark status, repeatability evidence and operator approval. Rollback restores pre-promotion backup.
Dimensional progress tracking detects plateau (3+ identical scores across all dimensions) and triggers stop
All scoring is regex/string/file-existence based (no LLM-as-judge) for promotion gate reliability
Emits legal_stop_evaluated and blocked_stop events to the JSONL ledger matching the deep-loop runtime-truth contract
Session-boundary gate enforces fresh-session isolation before initialization
Modes: :auto, :confirm. Supports any agent in .opencode/agents/ as target

DOCTOR

Three commands cover every spec-kit diagnostic surface. Run /doctor with no target to see the interactive menu. Upgrade users see "Update everything to match latest release" as option 1.

/doctor <target> (router)

Single entry point for 7 subsystems: memory, causal-graph, code-graph, deep-loop, cocoindex, skill-advisor, skill-budget
Argv-positional dispatch via .opencode/commands/doctor/_routes.yaml manifest (canonical per-target metadata: setup vars, allowed flags, mutation class, MCP tools, advisor trigger phrases)
Each target loads its own self-contained YAML workflow under assets/doctor_<target>.yaml
Interactive menu when no target supplied. Tier 2 per-target prompt when a required flag is missing
Examples: /doctor memory --dry-run, /doctor causal-graph --confidence-threshold=0.8, /doctor code-graph --scope=stale
--target=<name> is preserved as a compatibility alias for flag-only invocation

/doctor:mcp install|debug

MCP infrastructure repair (replaces the standalone /doctor:mcp_install and /doctor:mcp_debug from v3.4.0.0)
install. Fresh install or reinstall of the native MCP servers from their install guides. Handles old-conflicting-with-new (clean reinstall with venv/node_modules removal)
debug. Diagnoses the native MCP servers (Spec Kit Memory, System Skill Advisor, System Code Graph, CocoIndex Code, Code Mode, Sequential Thinking) with PASS/WARN/FAIL per check. Supports --fix for guided repair

/doctor:update

Multi-subsystem orchestrator: dependency-safe rebuild across code-graph → context-index + vector-index → causal-edges → skill-graph → advisor → deep-loop → cocoindex → eval
One lock (mcp_server/database/.doctor-update.flock), one pre-mutation snapshot set, one dependency DAG, one rollback policy, one state log (.doctor-update.last-run.json)
Tier-aware mid-run prompts: SHORT steps auto-acknowledge. MEDIUM steps share one combined prompt (Q-MED). LONG-POLE memory_index_scan gets explicit ETA prompt (Q-LONG, 5-15 min)
Additional gates: Q-PROBE (active MCP clients warning, NOT suppressed by --force), Q-LEGACY (per-file cleanup with --cleanup-legacy), Q-FAIL (step-failure recovery)
Use after upgrading spec-kit, after large packet moves or when multiple subsystem doctors would otherwise need to run by hand. Pass --migrate to handle schema migration (e.g. v3.3.0.0 → v3.4.1.0). Wall-clock 8-25 min

The 10 underlying YAML workflows in .opencode/commands/doctor/assets/ are self-sufficient. Each declares its own role/purpose/action/operating_mode/invariants/upstream_assets/user_inputs/field_handling block plus phased execution. The route-validate.{sh,py} CI script enforces internal consistency on the route manifest.

UTILITY

Agent Router

Routes requests to external AI systems (Gemini CLI, Codex CLI, Claude Code, Copilot CLI)
The receiving AI operates under its own system prompt - full identity adoption
Use for cross-AI delegation where the target AI needs to behave as itself

Prompt

Refines prompts and prompt packages through /prompt using 7 proven frameworks (RCAF, COSTAR, RACE, CIDI, TIDD-EC, CRISPE, CRAFT)
Applies DEPTH thinking methodology with CLEAR quality scoring
Can return inline improvements or route to @prompt-improver for higher-stakes prompt packages

🔌 Code Mode MCP

Code Mode MCP gives the AI access to external tools (Figma, GitHub, Chrome DevTools, ClickUp, Webflow) through a single TypeScript execution interface. Instead of loading large external tool definitions into context, Code Mode loads them on demand through one interface (1.6k tokens) - a 98.7% reduction.

Native MCP Servers

Canonical native server set:

Server	Tools	Purpose
`mk-spec-memory`	39	Cognitive memory, session recovery, causal/eval tools and graph loops
`mk_skill_advisor`	9	Gate 2 advisor routing plus skill-graph scan/query/status/validation
`mk_code_index`	11	Structural code graph, `detect_changes` and CocoIndex bridge helpers
`code_mode`	7	External tool orchestration via TypeScript execution
`cocoindex_code`	2	Semantic code search via vector embeddings
`sequential_thinking`	1	Structured multi-step reasoning for complex problems
Total	69

Lifecycle guardrails: mk-spec-memory, mk_skill_advisor, and mk_code_index use the shared idle-timeout knob SPECKIT_LAUNCHER_IDLE_TIMEOUT_MIN. Orphan cleanup is documented in .opencode/scripts/README.md; the checked-in LaunchAgent is only a template until an operator copies and loads it.

Code Mode Tools (7)

search_tools - Discover relevant tools by task description
tool_info - Get complete tool parameters and TypeScript interface
call_tool_chain - Execute TypeScript code with access to all registered tools
list_tools - List all currently registered tool names
register_manual - Register a new tool provider
deregister_manual - Remove a tool provider
get_required_keys_for_tool - Check required environment variables for a tool

External Integrations (via `.utcp_config.json`)

chrome_devtools_1 (MCP/stdio) - Browser automation (instance 1). No env var needed.
chrome_devtools_2 (MCP/stdio) - Browser automation (instance 2). No env var needed.
clickup (MCP/stdio) - Task management, goals, docs. Requires CLICKUP_API_KEY.
figma (MCP/stdio) - Design files, components, exports. Requires FIGMA_API_KEY.
github (MCP/stdio) - Issues, pull requests, commits. Requires GITHUB_PERSONAL_ACCESS_TOKEN.
webflow (MCP/remote) - Sites, CMS collections. Requires Webflow auth.

Performance

Metric	Without Code Mode	With Code Mode
Context tokens	Large external tool schemas loaded upfront	1.6k (on-demand)
Round trips	15+ for chained operations	1 (TypeScript chain)
Type safety	None	Full TypeScript
Context reduction	-	98.7%

To call a Code Mode tool: call_tool_chain({ typescript: "const result = await figma.figma_get_file({fileKey: 'abc123'}); return result;" })

For more on the mcp-code-mode skill and TypeScript execution patterns, see the skill at .opencode/skills/mcp-code-mode/SKILL.md.

4. CONFIGURATION

🎯 Customizing for Your Stack: Start with `sk-code`

This repo ships as a public template. Of the skills it ships with, only one carries stack-specific content, start there:

Skill / Surface	Out-of-the-box	Notes
`sk-code`	🎨 Stack-specific (the customization point)	Surface-aware code-quality patterns. Replace the shipped Webflow + OpenCode + Motion.dev surfaces with your own (e.g., Next.js + Tailwind + Postgres or React Native + Reanimated or Go + sqlc, etc.).
`sk-doc`	✅ Codebase-agnostic	Markdown quality + component creation. Works for any project.
`sk-git`	✅ Codebase-agnostic	Worktree + commit + PR workflow. Works for any project.
`sk-code-review`	✅ Codebase-agnostic baseline	Pulls surface evidence FROM `sk-code`. Customize `sk-code` and the review baseline auto-adapts.
`system-spec-kit`	✅ Codebase-agnostic	Spec folder workflow + validator + memory. Works for any project.
`mcp-coco-index`	✅ Codebase-agnostic	Semantic code search via embeddings. Works for any project.
`mcp-code-mode`	✅ Codebase-agnostic	Multi-tool MCP orchestration. Works for any project.
`deep-loop-runtime` / `deep-research` / `deep-review`	✅ Codebase-agnostic	Shared runtime plus iterative loop protocols. Work for any topic / target.
`sk-prompt` / `deep-agent-improvement`	✅ Codebase-agnostic	Prompt + agent improvement frameworks. Work for any project.
`cli-*` (codex/copilot/gemini/claude-code/opencode)	✅ Codebase-agnostic	External CLI orchestrators. Stack-independent.
`mcp-chrome-devtools`	✅ Codebase-agnostic	Browser tooling. Stack-independent.

Adding your own skills: the shipped set is intentionally minimal, most teams will add their own skills (project-specific workflows, ops runbooks, domain-specific reviewers, etc.). That's expected and supported. Just drop them into .opencode/skills/<your-skill>/ and they'll be picked up by the advisor. The shipped skills above are kept agnostic so upstream updates apply cleanly to your fork.

What "adapting sk-code" looks like:

Replace references/webflow/, references/opencode/, references/motion_dev/ with your stack's references (e.g., references/nextjs/, references/postgres/).
Replace assets/webflow/, assets/opencode/, assets/motion_dev/ with your stack's assets (checklists, recipes, snippets).
Update SKILL.md §2 Smart Routing, STACK_FOLDERS dict + the bash detection block, to match your stack's marker files and CWD signals.
Update the RESOURCE_MAP intent → file paths to point at your renamed references/assets.
Bump sk-code version + ship a changelog. Use the assets/opencode/checklists/skill_authoring.md checklist as your guide.

The other shipped skills will continue working unchanged: sk-doc will still validate your markdown, sk-git will still manage your branches, system-spec-kit will still spec your work and sk-code-review will surface YOUR sk-code evidence at review time.

Core Configuration Files

CLAUDE.md - Gate definitions, behavior rules, coding anti-patterns. Used by Claude Code (primary runtime).
AGENTS.md - Agent routing, capability reference, gate documentation. Used by all runtimes.
opencode.json - MCP server bindings, model configuration and launcher notes. Used by OpenCode platform.
.utcp_config.json - Code Mode external tool registrations. Used by mcp-code-mode skill.
.claude/mcp.json - Claude Code MCP configuration. Claude Code only.
.codex/config.toml - Codex CLI MCP configuration and profile definitions.
.gemini/settings.json - Gemini CLI configuration. Gemini CLI only.
.vscode/mcp.json - VS Code / Copilot MCP configuration wrapper.

Memory Engine Configuration

The memory server reads configuration from environment variables:

VOYAGE_API_KEY (optional) - Voyage AI cloud embeddings (opt-in only, gated by egress guard)
SPECKIT_EMBEDDER (optional) - Override the default embedder id (default: ollama-nomic-v1.5 since ADR-013/014 2026-05-19; was previously ollama-jina-v3). See embedder-pluggability.md for the registered list.
SPECKIT_RERANK_LAYER (optional) - Retrieval-rescue layer toggle, default true per ADR-011. Set to false to disable.
HF_EMBEDDINGS_DTYPE (optional) - hf-local fallback dtype (default: q8. Also: fp32, fp16, q4, int8, uint8, bnb4)
OPENAI_API_KEY (optional) - OpenAI embeddings (alternative)
MEMORY_DB_PATH (optional) - Override default database path

Default repo-local database path: .opencode/skills/system-spec-kit/mcp_server/database/context-index__ollama__nomic-embed-text-v1.5__768.sqlite (default since ADR-013/014 2026-05-19; previously __jina-embeddings-v3__1024__q4_k_m.sqlite). The filename encodes provider, model, dimension and dtype so multiple backends can coexist on disk without mixing vectors.

[!TIP] If no API key is set, the memory engine auto-detects the local Ollama endpoint serving nomic-embed-text-v1.5 (current default per ADR-013/014), falls back to jina-embeddings-v3 if nomic isn't pulled, then to HuggingFace Local embeddings.

Memory Feature Flags

Feature flags control search channels, scoring signals, save-time enforcement and evaluation behavior. The important retrieval/runtime flags are resolved at call time, so long-lived MCP processes do not depend on frozen import-time snapshots.

Search Pipeline - 5-channel retrieval, fallback routing, reranking, graph-walk rollout, confidence and token-budget policies.
Session/Cache - Working memory, cache invalidation on DB rebind, session deduplication, recovery helpers.
Memory/Storage - Save quality gate, reconsolidation, governed scopes, causal graph maintenance, projection cleanup.
Runtime Lifecycle - MCP idle self-exit through SPECKIT_LAUNCHER_IDLE_TIMEOUT_MIN; orphan sweeper rollout remains dry-run-first until explicitly installed.
Embedding/API - Startup provider resolution, fail-fast dimension checks, structured fallback metadata for effective vs requested provider.
Evaluation/Debug - Trace mode, eval logging, ablation/reporting guardrails, feedback evaluation and proposal diagnostics that observe candidates without reordering live results.

For the complete flag reference with per-flag defaults, see ENV_REFERENCE.md and the MCP Server runtime guardrail notes.

Database Schema

The runtime centers on a SQLite memory_index table with 56 columns plus companion FTS5/vector, lineage, checkpoint, working-memory and eval tables.

Primary store - memory_index holds the searchable memory rows plus governance, quality, chunking and retrieval metadata.
Search companions - FTS5 and vector tables support lexical and embedding retrieval alongside BM25 rebuild/index data.
Graph/lifecycle - Causal edges, lineage projection, checkpoints, working memory and access tracking support decision tracing and session continuity.
Evaluation - Separate eval tables persist ablation/reporting metrics, with guards for missing query IDs and synthetic token-usage markers.
Paths - The checked-in configs default to the provider-keyed database path under .opencode/skills/system-spec-kit/mcp_server/database/. The filename encodes provider, model, dimension and dtype (current default since ADR-013/014: context-index__ollama__nomic-embed-text-v1.5__768.sqlite; jina-v3 fallback would produce context-index__ollama__jina-embeddings-v3__1024__q4_k_m.sqlite). If a runtime cannot write inside the repo, override MEMORY_DB_PATH (and, when relevant, SPEC_KIT_DB_DIR) to a writable location.

MCP Config Shape

Abbreviated shape. Runtime config files can temporarily differ while the mk_code_index rename is being rolled out across clients. The canonical code-graph identity is mk_code_index / mcp__mk_code_index__*.

{
  "mcp": {
    "mk-spec-memory": {
      "type": "local"
    },
    "mk_skill_advisor": {
      "type": "local"
    },
    "mk_code_index": {
      "type": "local"
    },
    "code_mode": {
      "type": "local"
    },
    "cocoindex_code": {
      "type": "local"
    },
    "sequential_thinking": {
      "type": "local"
    }
  }
}

Maintainer-Mode Code-Graph Flags (already disabled for end users)

All 5 runtime MCP configs (opencode.json, .claude/mcp.json, .codex/config.toml, .gemini/settings.json, .vscode/mcp.json) carry five opt-in maintainer flags:

SPECKIT_CODE_GRAPH_INDEX_SKILLS    (covers .opencode/skills/**)
SPECKIT_CODE_GRAPH_INDEX_AGENTS    (covers .opencode/agents/**)
SPECKIT_CODE_GRAPH_INDEX_COMMANDS  (covers .opencode/commands/**)
SPECKIT_CODE_GRAPH_INDEX_SPECS     (covers <active-spec-folder>/**)
SPECKIT_CODE_GRAPH_INDEX_PLUGINS   (covers .opencode/plugins/**)
SPECKIT_CODE_GRAPH_DB_DIR          (optional code-graph SQLite directory override)

End users see all 5 as "false" thanks to the git clean filter. That's the framework default and what you want, the code graph indexes your project code, not the framework backend.

Maintainers (us) have all 5 as "true" locally because we navigate .opencode/ to iterate on the framework. The smudge filter restores "true" on checkout/pull/clone after running ./scripts/setup-maintainer-filters.sh.

Per-call override: the same five flags exist as includeSkills / includeAgents / includeCommands / includeSpecs / includePlugins arguments on code_graph_scan. Per-call args always override env defaults, so you can flip behavior for one scan without editing config.

Git clean filter: maintainer mode stays local

The repo ships a .gitattributes rule that runs an idempotent sed-based clean filter on the 5 config files: every "true" for these flags is rewritten to "false" when the file enters the git index. The smudge filter rewrites "false" → "true" on checkout/pull/clone for installed maintainers. Net effect:

End users cloning the template → all 5 configs show "false" (framework default, correct out of box)
Maintainers after running ./scripts/setup-maintainer-filters.sh → all 5 configs show "true" locally. Commits + pushes still ship "false" to the remote

To opt into maintainer mode on a fresh clone (only relevant if you're contributing upstream):

./scripts/setup-maintainer-filters.sh
git rm --cached opencode.json .claude/mcp.json .gemini/settings.json .vscode/mcp.json .codex/config.toml
git checkout -- opencode.json .claude/mcp.json .gemini/settings.json .vscode/mcp.json .codex/config.toml

After that, cat opencode.json shows "true". git show HEAD:opencode.json shows "false" (what the remote sees).

5. FAQ

Q: Do I need all 22 skills installed to use the framework?

A: No. Skills are loaded on demand by Gate 2. You only need the ones relevant to your work. The two core documentation skills - system-spec-kit and sk-doc - cover most documentation workflows. The MCP and cross-AI CLI skills require additional local tooling or API keys depending on the surface. Q: Is this only for OpenCode or does it work with other runtimes?

A: It works with OpenCode, Codex CLI, Claude Code and Gemini CLI. The repo also includes Copilot CLI-oriented startup-surface integration. Agent definitions are mirrored in the checked-in Claude, Codex and Gemini runtime directories. OpenCode and Copilot CLI use runtime-specific MCP or startup integration rather than a dedicated agent mirror. Q: What happens if I do not use a spec folder?

A: Gate 3 blocks file modifications until a spec folder answer is provided. You can skip it with option D, but skipped sessions are undocumented and will not be recoverable via memory search. For trivial changes under 5 characters in a single file, Gate 3 does not trigger. Q: How does the memory system know what is relevant to my current task?

A: Packet continuity and any supporting generated context artifacts use structured frontmatter and anchored markdown so the memory engine can classify, index and retrieve them reliably. For recovery, start with /speckit:resume and the packet-local continuity ladder handover.md -> _memory.continuity -> canonical spec docs. After that, memory_match_triggers() runs a fast trigger/cognitive pass, while memory_context() and memory_search() handle deeper retrieval with intent routing, reranking and filtering. Q: Can I use this framework without the cognitive memory features?

A: Yes. The Spec Kit documentation workflow (Gate 3, spec folders, templates) works independently of the memory MCP server. You lose cross-session memory retrieval, but structured documentation, agent routing and skill loading all still work. Q: How do I add a new skill to the framework?

A: Use /create:skill to scaffold the skill structure. The command creates the SKILL.md, references and assets directories following the sk-doc template. Then register the skill in .opencode/skills/README.md. Q: What does "local-first" mean for the memory system?

A: The memory database is a SQLite file on your local machine. No session data, code or context is sent to any external service unless you configure a cloud embedding provider (Voyage AI or OpenAI). HuggingFace Local embeddings run entirely on-device. Q: How do I contribute a new agent definition?

A: Define the agent in .opencode/agents/ (the source of truth), then mirror the adapter into .claude/agents/, .codex/agents/ and .gemini/agents/. Use /create:agent to scaffold the file from the agent template. Q: How many MCP tools are there and where are they defined?

A: 69 total across 6 native MCP servers, sourced from registered MCP-dispatched tools only. Breakdown: 39 mk-spec-memory tools from .opencode/skills/system-spec-kit/mcp_server/tool-schemas.ts, 9 mk_skill_advisor tools from .opencode/skills/system-skill-advisor/mcp_server/advisor-server.ts, 11 mk_code_index tools from .opencode/skills/system-code-graph/mcp_server/tool-schemas.ts, 7 code mode tools, 2 cocoindex_code tools and 1 sequential thinking tool. Canonical advisor/skill-graph docs use mk_skill_advisor / mcp__mk_skill_advisor__*. Canonical code-graph docs use mk_code_index / mcp__mk_code_index__*.

Q: What is the feature catalog?

A: The feature catalog is the current technical reference documenting the memory system's live capabilities. It lives at .opencode/skills/system-spec-kit/feature_catalog/feature_catalog.md. The code graph runtime adds package-local docs at .opencode/skills/system-code-graph/feature_catalog/.

6. RELATED DOCUMENTS

Internal Documentation:

→ AGENTS.md - Agent routing, gate definitions, behavior rules
→ Spec Kit README - Spec folder workflow, Level contract template set, validation rules
→ MCP Server README - Memory API reference and runtime support docs
→ Repo Scripts Runbook - Dry-run orphan MCP sweeper, Claude cleanup, and LaunchAgent template guidance
→ Orphan MCP Leak Prevention Packet - Canonical implementation summary and rollout state
→ System Code Graph Skill - First-class structural graph skill and MCP routing rules
→ Skill Advisor README - Standalone mk_skill_advisor server, nine advisor/skill-graph tools and routing docs
→ Install Guide - MCP server setup, embedding providers
→ Deployment Notes - Docker anti-patterns, Copilot notes and session-resume auth flag
→ Architecture - API boundary contract
→ sk-doc Skill - Documentation standards, DQI scoring
→ Skills Index - Skills library and invocation patterns
→ Feature Catalog - Current technical reference
→ Manual Testing Playbook - Operator validation scenarios, including runtime lifecycle checks
→ Code Graph Runtime Catalog - Package-local code graph runtime inventory
→ Code Graph Manual Playbook - Operator scenarios for code graph validation
→ Latest System Spec-Kit Release Notes - Most recent shipped release notes

External Resources:

→ OpenCode - The underlying AI coding platform
→ Voyage AI - Cloud embedding provider (opt-in)
→ HuggingFace - Free local embedding alternative

Documentation version: 4.13 | Last updated: 2026-05-24 | Framework: 11 agents, 22 skills, 24 commands, 69 MCP tools (39 mk-spec-memory + 9 mk_skill_advisor + 11 mk_code_index + 7 code mode + 2 CocoIndex + 1 sequential thinking. Deferred / internal-only handlers do NOT count).

Name	mcp-code-mode
Description	MCP orchestration via TypeScript execution for efficient multi-tool workflows. Use Code Mode for ALL MCP tool calls (ClickUp, Notion, Figma, Webflow, Chrome DevTools, etc.). Provides 98.7% context reduction, 60% faster execution, and type-safe invocation. Mandatory for external tool integration.

mcp-code-mode

SKILL.md