mcp-code-mode
MCP orchestration via TypeScript execution for efficient multi-tool workflows. Use Code Mode for ALL MCP tool calls (ClickUp, Notion, Figma, Webflow, Chrome DevTools, etc.). Provides 98.7% context reduction, 60% faster execution, and type-safe invocation. Mandatory for external tool integration.
SKILL.md
| Name | mcp-code-mode |
| Description | MCP orchestration via TypeScript execution for efficient multi-tool workflows. Use Code Mode for ALL MCP tool calls (ClickUp, Notion, Figma, Webflow, Chrome DevTools, etc.). Provides 98.7% context reduction, 60% faster execution, and type-safe invocation. Mandatory for external tool integration. |
Skilled Agent Orchestration w/ Custom Spec Kit
| Core layer | What it adds |
|---|---|
| 📋 Spec Kit Framework | Structured plans, task tracking, validation gates, and handover docs |
| 🧠 Cognitive Memory | Local-first project memory for decisions, context, and continuity |
| ⚛️ Hybrid RAG + Smart Graph | Retrieval that blends semantic search with graph-aware project context |
| 🔍 Code Index + Graph | Callers, imports, impact paths, and concept-based code discovery |
| 🤖 11 Specialized Agents | Focused roles for implementation, review, research, docs, git, and more |
| 🎯 22 On-Demand Skills | Skill Advisor routing for the right workflow at the right time |
Reasons to try it
- Works with Opencode, Codex, Claude Code, Gemini, and Devin CLI
- Supports external CLI agent orchestration without unnecessary MCPs or proxies
- Designed to be modular, readable, and easy to adapt to your own stack
Don't buy me unwanted coffee: https://buymeacoffee.com/michelkerkmeester
<!-- ANCHOR:table-of-contents -->
TABLE OF CONTENTS
<!-- /ANCHOR:table-of-contents --> <!-- ANCHOR:overview -->
1. OVERVIEW
What This Framework Does
AI coding assistants have amnesia. Every session starts from zero. You explain your architecture Monday. By Wednesday, it is gone. Decisions, trade-offs, the carefully reasoned choices behind them, all lost the moment the conversation window closes. This framework fixes that.
The framework adds four layers on top of the base platform:
- Structured documentation (Spec Kit) - every file change gets a spec folder recording what changed, why and how. Like a lab notebook for software.
- Cognitive memory (MCP server) - a local-first memory engine storing decisions, context and project history in a searchable database. Like a personal librarian who remembers every conversation.
- Code intelligence (Code Graph + CocoIndex) - structural graph indexing handles callers, imports and impact analysis, while semantic search finds code by concept.
- Coordinated agents and skills - 11 specialized agents routed by a gate system that loads the right skills at the right time.
| 🤖 11 Agents | 11 custom specialists, multi-runtime |
| 🎯 22 Skills | Code, docs, git, prompts, MCP, research, review, council, improvement, cross-AI, small-model sentinel, and standalone system packages |
| ⌨️ 24 Commands | 4 speckit + 4 memory + 7 create + 4 deep + 3 doctor + 2 root utilities |
| 🔧 69 MCP Tools | mk-spec-memory (39), mk_skill_advisor (9), mk_code_index (11), code mode (7), CocoIndex (2), sequential thinking (1). See canonical count in FAQ |
| 🔍 CocoIndex Code | Forked from cocoindex-io/cocoindex-code (Apache 2.0) - semantic code search via vector embeddings and natural-language discovery across 28+ languages |
| 🏗️ Code Graph | First-class skill at .opencode/skills/system-code-graph/ with standalone MCP server identity mk_code_index and client namespace mcp__mk_code_index__* |
| ⚡ Runtime Coverage | OpenCode, Codex CLI (requires [features].codex_hooks = true opt-in for native hooks), Claude Code, Gemini CLI, plus Copilot CLI startup-surface support (file-based custom instructions) |
How It All Connects
YOUR REQUEST
│
▼
┌──────────────────────────────────────────┐
│ GATE SYSTEM (3 mandatory gates) │
│ │
│ Gate 1: Context Gate 2: Skills │
│ Surface relevant Auto-load the right │
│ prior memory domain expertise │
│ │
│ Gate 3: Spec Folder (HARD BLOCK) │
│ Every file change needs documentation │
└──────────────────────┬───────────────────┘
│
┌──────────────┴──────────────┐
▼ ▼
┌───────────────┐ ┌──────────────────┐
│ AGENT NETWORK │ │ SKILLS LIBRARY │
│ 11 specialized│ │ 22 domain skills │
│ agents with │◄────────►│ auto-loaded by │
│ routing logic │ │ task keywords │
└───────┬───────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────────────────────────────────┐
│ NATIVE MCP TOPOLOGY │
│ 6 native servers - each one a separate │
│ process and MCP boundary │
│ │
│ mk-spec-memory context + memory │
│ mk_skill_advisor skill routing │
│ mk_code_index structural graph │
│ cocoindex_code semantic search │
│ code_mode external tools │
│ sequential_thinking reasoning helper │
│ │
│ Shared contract: hybrid retrieval + │
│ startup payload via runtime hooks │
└──────────────────────┬───────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ SPEC KIT (documentation framework) │
│ specs/###-feature/ - scratch/ │
│ 4 levels - template set - 20 rules │
│ nomic-v1.5 (Ollama) │ HF Local │ Voyage │
└──────────────────────────────────────────┘
What's Shipped Recently
The code graph now lives in .opencode/skills/system-code-graph/ with its own MCP boundary. A follow-on rename established mk_code_index as the standalone server identity and mcp__mk_code_index__* as the live documentation namespace.
Recent work also tightened the public surface without turning this README into a changelog: CocoIndex now has a canonical feature inventory, stress coverage is aligned, and the local llama-cpp path has stronger embedding failure reporting plus token-aware truncation.
Embedder Architecture
Both native MCPs are pluggable out of the box, no code change to swap. mk-spec-memory defaults to sbert/nomic-ai/CodeRankEmbed (768 dim, MIT) through the Ollama -> hf-local Nomic cascade; SPECKIT_CROSS_ENCODER remains default-off, with configured opt-in reranker cross-encoder/ms-marco-MiniLM-L-6-v2. CocoIndex ships a two-stage pipeline: Stage 1 embedder default sbert/nomic-ai/CodeRankEmbed (768 dim, MIT, code-tuned bi-encoder, MPS auto-detect on Apple Silicon — promoted 2026-05-19); Stage 2 cross-encoder reranker default Qwen/Qwen3-Reranker-0.6B (Apache-2.0 — promoted 2026-05-20 per ADR-027 after a head-to-head bench against jina-reranker-v3, which is kept as opt-in fallback). The Code Graph rides on CocoIndex's embedder choice via a shared bridge. See the canonical narrative at embedder-pluggability.md and the swap runbook in CocoIndex INSTALL_GUIDE §4 "Choosing an embedder".
2. QUICK START
Installation
Prerequisites: Node.js 18+ with npm, git and a POSIX shell. The launcher binaries vendor their own dependencies on first run, so you do not need TypeScript or tsc installed globally.
# 1. Clone the repository
git clone https://github.com/MichelKerkmeester/opencode--spec-kit-skilled-agent-orchestration.git
cd opencode--spec-kit-skilled-agent-orchestration
# 2. Install root dependencies (file watcher + shared HTTP utilities)
npm install
# 3. Boot the native MCP servers via their committed launchers
# Each launcher is a self-contained .cjs that vendors its own deps on first run.
node .opencode/bin/mk-spec-memory-launcher.cjs --help
node .opencode/bin/mk-skill-advisor-launcher.cjs --help
node .opencode/bin/mk-code-index-launcher.cjs --help
# 4. (Optional) Install the CocoIndex Code soft-fork for semantic code search
bash .opencode/skills/mcp-coco-index/scripts/install.sh
The native MCP servers (mk-spec-memory, mk_skill_advisor, mk_code_index) ship as committed launcher binaries under .opencode/bin/. They self-vendor their dependencies on first invocation and the checked-in runtime configs already point at them. There is no separate build step.
Runtime lifecycle guardrails are part of the native MCP stack. The servers share SPECKIT_LAUNCHER_IDLE_TIMEOUT_MIN for idle self-exit, and the repo ships a dry-run-first orphan process sweeper plus a LaunchAgent template under .opencode/scripts/. The LaunchAgent is not installed or loaded by default; activation is a separate operator-approved rollout. See Repo Scripts Runbook and the 022 orphan MCP leak prevention packet.
Set Up Embedding Provider
Choose an embedding provider:
# Default when no cloud keys are set: nomic-embed-text-v1.5 (768 dim)
# served by a local Ollama HTTP endpoint. Pull the model once:
# ollama pull nomic-embed-text:v1.5
# (jina-embeddings-v3 is the second-priority fallback; pull via:
# ollama pull hf.co/gaianet/jina-embeddings-v3-GGUF:Q4_K_M)
# Option A: Voyage AI (cloud, requires API key, opt-in only)
export VOYAGE_API_KEY="your-key-here"
# Option B: OpenAI embeddings (cloud, requires API key)
export OPENAI_API_KEY="your-key-here"
# Option C: HuggingFace Local (free, CPU/ONNX fallback when Ollama is unavailable)
# Auto-detected when the Ollama probe fails and no cloud keys are set
Verify Installation
# Confirm the launcher binaries respond
node .opencode/bin/mk-spec-memory-launcher.cjs --help
node .opencode/bin/mk-skill-advisor-launcher.cjs --help
node .opencode/bin/mk-code-index-launcher.cjs --help
# Confirm the active runtime's MCP config references the launchers
# (only the runtime you use needs to exist. .codex/config.toml ships in the repo)
grep -l 'mk-spec-memory\|mk_skill_advisor\|mk_code_index' \
opencode.json .claude/mcp.json .codex/config.toml .gemini/settings.json 2>/dev/null
First Use
Open OpenCode in your project directory. The framework is active. Try:
/speckit:complete Build a user authentication system
This creates a spec folder, runs research, builds a plan and begins implementation - all with memory saved automatically. When you come back tomorrow, the memory engine remembers everything.
Adapting to Your Stack
This repo ships as a public template. Of the shipped skills, sk-code carries the stack-specific patterns (frontend framework, animation library, CMS, backend language). Start there when forking. The other shipped skills (system-spec-kit, sk-doc, sk-git, sk-code-review, mcp-coco-index, the deep-research/deep-review loops, deep-loop-runtime, the cli-* orchestrators) are codebase-agnostic out of the box and work for any project without modification. Most teams will also add their own skills on top. Drop them into .opencode/skills/<your-skill>/ and they'll be picked up automatically.
See §4 Customizing for Your Stack for the full customization map and step-by-step adaptation guide.
Code-Graph Indexing
The standalone mk_code_index MCP server indexes your project's production code by default, not the framework backend. End users inherit this behavior automatically through the committed config defaults. See §4 Maintainer-Mode Code-Graph Flags only if you're contributing upstream.
3. FEATURES
📋 Spec Kit Documentation
The Spec Kit enforces structured spec folders for every file-modifying conversation. Gate 3 requires a spec folder answer before any file modification begins (only typo/whitespace fixes under 5 characters are exempt).
Documentation Levels
Documentation depth scales with task complexity.
| Level | LOC Guidance | Required Files | When to Use |
|---|---|---|---|
| 1 | < 100 | spec.md, plan.md, tasks.md, implementation-summary.md | Small features, bug fixes, single-file changes |
| 2 | 100 - 499 | Level 1 + checklist.md | Features needing QA verification, multi-file changes |
| 3 | 500+ | Level 2 + decision-record.md | Architecture changes, complex refactors |
| 3+ | Complexity 80+ | Level 3 + approval workflow, compliance checkpoints, stakeholder matrix | High-complexity work needing review tracking and workstream coordination |
The LOC ranges are guidance, not hard rules. Risk, complexity and the number of affected files can push a task to a higher level. When in doubt, choose the higher level.
Implementation-summary.md is required at all levels but created after implementation completes, not at spec folder creation time.
Spec Folder Structure
specs/<###-feature-name>/
├── description.json # Spec identity and memory tracking metadata
├── spec.md # What the feature is and why it exists
├── plan.md # How to implement it
├── tasks.md # Step-by-step task breakdown
├── checklist.md # QA validation gates (Level 2+)
├── decision-record.md # Architecture decisions (Level 3+)
├── implementation-summary.md # Post-implementation summary (all levels)
├── resource-map.md # Optional path ledger of resources the packet touched
├── graph-metadata.json # Packet-level graph metadata (auto-refreshed on save)
└── scratch/ # Temporary workspace files
resource-map.md is optional at any level. Render it from .opencode/skills/system-spec-kit/templates/manifest/resource-map.md.tmpl when a packet wants a lean, central listing of the files, scripts and external resources it interacts with. Deep-research and deep-review loops emit it automatically next to review-report.md.
Checklist Priority System
Checklists use a priority system so reviewers know what blocks shipping and what can wait:
- P0 - Hard blocker. Cannot ship without this. Cannot defer.
- P1 - Required. Must complete or get explicit user approval to defer.
- P2 - Optional. Nice to have. Can defer without approval.
Phase Decomposition
Phase decomposition splits large features into a parent spec folder (overall specification) and child folders (one per phase).
specs/022-big-feature/ # Parent spec folder
├── spec.md # Overall specification
├── 001-data-model/ # Phase 1 child
│ ├── spec.md
│ └── ...
├── 002-api-endpoints/ # Phase 2 child
│ ├── spec.md
│ └── ...
└── 003-frontend/ # Phase 3 child
├── spec.md
└── ...
Use create.sh --phase to create a parent with its first child in one step. Run validate.sh --recursive to validate the parent and all children together.
Validation
The validate.sh script runs 20 rules against a spec folder and reports what passes and what needs fixing. Rules check for required files, template compliance, placeholder detection, anchor markers and cross-reference consistency.
- Exit 0 - All rules pass. Ready to proceed.
- Exit 1 - User error (bad flags or invalid input).
- Exit 2 - Validation error. Must fix before claiming completion.
- Exit 3 - System error (file I/O failure, missing manifest or other environment problem).
Run with --verbose to see details behind each rule or --recursive to validate a parent and all child phase folders. Strict validation of a Level 3 packet runs in ~108 ms via a single-orchestrator design. The default scaffold path skips post-create validation. Set SPECKIT_POST_VALIDATE=1 to enable it for strict CI workflows. Path traversal inputs (e.g. --path "../etc/passwd") are rejected before any filesystem write. Parallel /memory:save calls for the same packet are serialized by an advisory lock on description.json and graph-metadata.json.
Scripts and Validation
Spec Management Scripts (in .opencode/skills/system-spec-kit/scripts/spec/):
create.sh- Create spec folders with level-appropriate templates. Use--phasefor parent + childvalidate.sh- Run 20 validation rules. Use--recursivefor phase foldersupgrade-level.sh- Upgrade a spec folder to a higher level by injecting new sectionsrecommend-level.sh- Analyze scope and risk to recommend the right documentation levelcalculate-completeness.sh- Calculate spec folder completeness as a percentagecheck-completion.sh- Verify all completion criteria are metcheck-placeholders.sh- Find remaining[PLACEHOLDER]values after level upgrade
Memory Scripts (in .opencode/skills/system-spec-kit/scripts/memory/):
generate-context.ts- Primary workflow for updating packet continuity and supporting generated context artifactsbackfill-frontmatter.ts- Add missing frontmatter to existing generated context artifacts and indexed spec docsreindex-embeddings.ts- Rebuild embedding vectors for stored memoriescleanup-orphaned-vectors.ts- Remove vector entries with no matching memoryrebuild-auto-entities.ts- Regenerate auto-extracted entity catalogvalidate-memory-quality.ts- Run quality checks on stored memory content
TypeScript sources compile to .opencode/skills/system-spec-kit/scripts/dist/. The runtime entry point for memory saves is .opencode/skills/system-spec-kit/scripts/dist/memory/generate-context.js.
Gate System
3 mandatory gates run before any file change. Every request passes through the same sequence.
User message arrives
│
▼
┌─────────────────────────────────────────────┐
│ Gate 1: Understanding (SOFT BLOCK) │
│ memory_match_triggers() surfaces context │
│ Classify intent: Research / Implementation │
│ confidence >= 0.70, uncertainty <= 0.35 │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Gate 2: Skill Routing (REQUIRED) │
│ advisor_recommend recommends skill │
│ confidence >= 0.8 ─► MUST load skill │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Gate 3: Spec Folder (HARD BLOCK) │
│ Only if file modification detected │
│ A) Existing B) New C) Update │
│ D) Skip E) Phase folder │
└──────────────────┬──────────────────────────┘
│
▼
EXECUTION
│
▼
┌─────────────────────────────────────────────┐
│ Post-Rules │
│ Memory Save ─ must use generate-context.js │
│ Completion ─ verify checklist.md items │
└─────────────────────────────────────────────┘
Analysis Lenses - applied silently on every request:
- CLARITY - Is this the simplest solution? Are abstractions earned?
- SYSTEMS - What does this touch? What are the side effects?
- BIAS - Is the user solving a symptom? Is the framing correct?
- SUSTAINABILITY - Will future developers understand this?
- VALUE - Does this change behavior or just refactor?
- SCOPE - Does solution complexity match problem size?
For the full spec folder workflow, Level contract template architecture, gate definitions and anti-pattern detection rules, see the → Spec Kit README and → AGENTS.md.
🧠 Memory Engine
The Memory Engine is a local-first cognitive memory system built as an MCP server. generate-context.js updates canonical packet continuity and may emit supporting generated context artifacts inside the spec folder. Canonical continuity lives in the spec packet itself: use /speckit:resume as the recovery surface, then rebuild context in this order: handover.md -> _memory.continuity -> canonical spec docs. The MCP server indexes those packet-local sources with vector embeddings, BM25 and FTS5 full-text search. memory_match_triggers() can still surface relevant prior context automatically when deeper retrieval is needed.
/memory:save refreshes packet metadata on every invocation. session_resume binds args.sessionId to transport caller context by default. Set MCP_SESSION_RESUME_AUTH_MODE=permissive for rollout canaries. Copilot, Claude and Gemini all share the same compact-cache provenance path.
The memory engine works with session lifecycle surfaces and hybrid retrieval. Structural code indexing now lives in the standalone system-code-graph skill and MCP server.
Expired ephemeral rows are cleaned by a retention sweep on startup and hourly by default. Use memory_retention_sweep for manual or dry-run cleanup. The handler is defined at memory-retention-sweep.ts, with SPECKIT_RETENTION_SWEEP and SPECKIT_RETENTION_SWEEP_INTERVAL_MS controlling the background interval.
The full MCP API reference is in the MCP Server README.
Layered MCP Surface
The mk-spec-memory tools are organized into a layered architecture. Code graph and skill-advisor tools moved to standalone MCP servers, so this table covers memory-owned tools only:
| Layer | Name | Tools | Token Budget | Purpose |
|---|---|---|---|---|
| L1 | Orchestration | 3 | 2,000 | Unified context, resume and bootstrap entry points |
| L2 | Core | 3 | 1,500 | Search, trigger matching, save |
| L3 | Discovery | 4 | 800 | List, stats, health checks and session readiness |
| L4 | Mutation | 5 | 500 | Delete, update, validate, bulk cleanup, retention sweep |
| L5 | Lifecycle | 4 | 600 | Checkpoints and lifecycle state |
| L6 | Analysis | 7 | 1,200 | Causal graph (link/stats/drift_why), epistemic baselines, evaluations, dashboards |
| L7 | Maintenance | 5 | 1,000 | Memory index scans, async ingest and learning history |
| L8 | Moved Surfaces | 0 | - | Code graph lives in mk_code_index. Advisor and skill graph live in mk_skill_advisor |
| L9 | Coverage Graph | 4 | 700 | Deep-loop coverage graph operations |
| L9 | Council Graph | 4 | 700 | AI Council graph operations |
| Total | 39 | ~10,180 |
Lower layers load only when needed. L1 is always available. L2 loads for any search. L3-L7 load based on the specific command being used.
Hybrid Search
Every search checks five core channels at once, with CocoIndex available as a semantic code search bridge:
- Vector - Semantic similarity via embeddings. Finds related content when words differ.
- FTS5 - Full-text search on exact words and phrases.
- BM25 - Keyword relevance scoring.
- Causal Graph - Follows cause-and-effect links between memories.
- Degree - Scores by graph connectivity, weighted by edge type.
Reciprocal Rank Fusion (RRF) combines results across channels so memories scoring well in multiple channels rise to the top. Graph-first routing dispatches structural queries to the standalone Code Graph first, then CocoIndex for semantic code discovery, then the memory pipeline. A 3-tier FTS fallback activates when graph and semantic channels miss: FTS5 full-text, BM25 keyword scoring, then Grep/Glob filesystem search. The system truncates weak results and ensures every active channel is represented.
Search Pipeline
Every search passes through 4 stages:
- Candidate generation - Parallel retrieval from the active channels plus constitutional injection where applicable.
- Fusion - RRF-based scoring with post-fusion signals such as co-activation, FSRS decay, interference control, intent weights and graph/session boosts when enabled.
- Rerank - Cross-encoder reranking with chunk reassembly, a minimum Stage 3 gate of 4 candidates and compatibility-only length-penalty wiring that resolves to a neutral
1.0multiplier.getRerankerStatus()exposes latency plus cache hits, misses, stale hits and evictions. If the reranker is unavailable, Stage 2 order is preserved with degraded metadata. - Filtering - State/quality filtering, confidence annotation, token-budget enforcement and final response shaping without mutating post-rerank scores.
Query Intelligence
- Complexity routing - Simple (2 channels), moderate (4), complex (all 5)
- Intent classification - 7 public types (
add_feature,fix_bug,refactor,security_audit,understand,find_spec,find_decision) plus an internal continuity profile for resume-oriented retrieval (semantic 0.52,keyword 0.18,recency 0.07,graph 0.23. Stage 3 MMR lambda0.65) - Query decomposition - Multi-topic queries split into sub-queries, expanded with related terms
- Context pressure - Downgrades search mode at 60% and 80% window usage
- Fallback strategies - LLM reformulation or HyDE for low-confidence searches
Four response modes: quick (top answer only), focused (one-topic), deep (full evidence trails), resume (state summary + next-steps).
Memory Lifecycle
Memories fade using FSRS (Free Spaced Repetition Scheduler). Decay speed varies by content type and importance tier. Critical decisions never fade. Temporary debugging notes fade within days.
- Cold-start boost - Fresh memories (under 48h) receive a temporary scoring lift
- Interference penalty - Suppresses near-duplicate clusters
- Auto-promotion - Memories earn higher tiers through positive validation
- Negative feedback - 30-day decay prevents permanent blacklisting
Four active cognitive states drive normal retrieval weighting: HOT >> WARM >> COLD >> DORMANT.
Causal Graph
Six relationship types: caused, enabled, supersedes, contradicts, derived_from, supports
- Typed traversal - Prioritizes connection types based on query intent
- Community detection - Louvain clustering with neighbor boosting
- Co-activation spreading - Fan-effect dampening prevents hub bias
- Temporal contiguity - Same-session grouping
- Graph momentum - Trending knowledge surfaces higher
- LLM backfill - Background discovery of missed causal links
Trust Badges on Search Results
Every search result ships with a small trustBadges block that tells you how reliable the hit is at a glance. The badges are display-only, they read existing causal links and don't add new storage:
| Badge | What it tells you |
|---|---|
confidence | How strong the strongest causal link to this result is |
extractionAge | How long ago the supporting evidence was extracted |
lastAccessAge | How recently anything in the chain was used |
orphan | True when nothing else in the graph points at this result |
weightHistoryChanged | True when the underlying edge weight has been re-tuned |
If the database is unreachable the formatter quietly skips badges instead of failing. Caller-provided badges pass through untouched. Every response profile (quick, research, resume) keeps the badges on the top result and the result list.
Save Intelligence
When you save new knowledge, Prediction Error gating compares it against existing memories and picks one of four outcomes:
- CREATE - No similar memory exists. Stored as new knowledge.
- REINFORCE - Similar exists, new one adds value. Both kept, old one boosted.
- UPDATE - Similar exists, new one is better. Old version replaced.
- SUPERSEDE - New knowledge contradicts the old. Old one demoted to deprecated.
Additional save-time processing:
- Semantic sufficiency gating - Rejects content too thin to be useful
- Verify-fix-verify - Auto-fixes quality issues before storing
- Content normalization - Strips formatting clutter for cleaner embeddings
- Auto-entity extraction - Spots tool/project/concept names for cross-linking
- SHA-256 deduplication - Skips unchanged files instantly
- Correction tracking - Records how knowledge evolves across versions
Session Awareness
- Working memory - Tracks current session findings with attention decay
- Session deduplication - Suppresses already-seen results in follow-up queries
- Context pressure - Downgrades search mode as the context window fills
Quality Gates
Three layered checks before storage:
- Structure gate - Format, headings, metadata validation
- Semantic sufficiency - Enough real content to be useful
- Duplicate detection - Triggers Prediction Error arbitration if similar content exists
Preview all checks without saving using dryRun: true. Learned relevance feedback boosts helpful results with safeguards against noise. Two-tier explainability shows plain-language reasons or exact channel contributions.
Retrieval Enhancements
- Constitutional injection - Always-surfaced rules appear without asking
- Hierarchy awareness - Searches parent and sibling spec folders
- Entity linking - Connects memories referencing the same concepts
- ANCHOR retrieval - Per-section indexing (~93% token savings)
- Auto-surfacing - Triggers on tool use and context compression events
- Provenance traces - Shows exactly how each result was found
Indexing and Infrastructure
- Real-time watching - Filesystem monitoring via chokidar
- Incremental indexing - Content hashes skip unchanged files
- Embedding retry - Background worker retries failed embeddings
- Lexical fallback - Text-searchable when embedding services are down
- Atomic writes - Crash-safe with pending-file recovery on startup
Evaluation
- 12-metric computation - MRR, NDCG, MAP and more
- Ground truth corpus - 110 test questions with known correct answers
- Ablation studies - Per-channel quality impact measurement
- Offline scoring checks - Test ranking changes before deployment
Embedding Providers
The embedder layer is pluggable. Swap defaults via env vars without touching code. Canonical narrative: embedder-pluggability.md.
- Ollama (nomic-embed-text-v1.5) - Default since 2026-05-19 (ADR-013/014). Free, local, 768d retrieval-tuned. Pull once with
ollama pull nomic-embed-text:v1.5. The cascade falls back tojina-embeddings-v3(1024d Q4_K_M) when nomic isn't pulled. - HuggingFace Local - Fallback when the Ollama probe fails. Free, local, 768d q8 ONNX.
- Voyage AI - Cloud opt-in. Set
VOYAGE_API_KEY. 1024d. Gated by egress guard. - OpenAI - Cloud opt-in. Set
OPENAI_API_KEY. 1536d.
🔍 CocoIndex + Code Graph
The framework uses two different code-understanding systems on purpose. CocoIndex handles semantic discovery, so the assistant can answer "find code that does X" or "how is Y implemented?" without knowing exact symbols first. The Code Graph handles structural expansion, so the assistant can answer questions like "what calls this?", "what imports this?" or "what breaks if we change it?" using an indexed relationship graph.
The intended routing order is graph-first: the code graph resolves structural queries first, CocoIndex finds semantic candidates when structural resolution misses and Memory supports session decisions and active-task context after the packet-local recovery sources have been checked. A 3-tier FTS fallback escalates automatically when results are weak.
Default Scope (End-User Code Only)
By default, code-graph scans your repo code only. Five .opencode/ folders are excluded so end-user search results stay signal-rich:
.opencode/skills/**.opencode/agents/**.opencode/commands/**<active-spec-folder>/**.opencode/plugins/**
Maintainers can opt folders back in process-wide with env vars:
SPECKIT_CODE_GRAPH_INDEX_SKILLS=true # all skills
SPECKIT_CODE_GRAPH_INDEX_SKILLS=sk-x,sk-y # only listed skills (csv)
SPECKIT_CODE_GRAPH_INDEX_AGENTS=true
SPECKIT_CODE_GRAPH_INDEX_COMMANDS=true
SPECKIT_CODE_GRAPH_INDEX_SPECS=true
SPECKIT_CODE_GRAPH_INDEX_PLUGINS=true
SPECKIT_CODE_GRAPH_DB_DIR=/path/to/code-graph-db # optional DB-dir override
Per-call args override env vars when provided. Env vars apply only for fields omitted from the scan call:
code_graph_scan({
includeSkills: ['sk-code-review', 'sk-doc'], // granular: only these skills
includeAgents: true, // all .opencode/agents/**
})
Existing v1 scans trigger a blocked read with requiredAction:"code_graph_scan" until you re-run the scan. See system-code-graph README §8 SCAN SCOPE for the full scan-scope rules and precedence details.
Our CocoIndex is forked. The Python wrapper that powers semantic search is a soft-fork at version 0.2.3+spec-kit-fork.0.2.0, vendored alongside the skill so it ships with this repo. The Rust engine underneath stays on PyPI. The fork adds four things the upstream wrapper doesn't: duplicate suppression so mirror copies of the same file don't crowd results, canonical path identity per chunk (so dedup works across symlinks), a path-class taxonomy that nudges "find me the implementation of X" toward implementation files first and ranking telemetry that surfaces why each result ranked where it did. Responses from the MCP tool or ccc search CLI carry seven fork-specific fields, source_realpath, content_hash, path_class, dedupedAliases, uniqueResultCount, raw_score, rankingSignals, that vanilla cocoindex output does not include. Schema, attribution and per-release patch list all live under .opencode/skills/mcp-coco-index/.
How the Code Graph Works
The Code Graph is a SQLite-backed structural index owned by .opencode/skills/system-code-graph/ and registered as the standalone mk_code_index MCP server. MCP callers use the mcp__mk_code_index__* namespace. Runtime config parity is mixed across clients during the rename transition, so docs use the canonical mk_code_index surface while follow-on config work handles remaining legacy bindings.
Startup injection. When the MCP server starts, it initializes the code-graph.sqlite database, runs a non-blocking startup scan and activates a file watcher. Three supported runtimes (Claude Code, Gemini CLI, Codex CLI) transport the same compact startup shared-payload through their runtime hooks (session-prime.ts on Claude/Gemini, session-start.ts on Codex). Codex requires [features].codex_hooks = true opt-in for native hooks. Copilot CLI uses file-based custom instructions with a limited cache and writer path. It refreshes a managed block but does not inject model-visible context during the precompute phase. The payload includes a one-line health summary, graphQualitySummary (detector provenance + edge-enrichment summary) and the sharedPayloadTransport envelope so downstream consumers receive identical structural context regardless of runtime. session_bootstrap() remains available as a manual recovery surface when native hooks are disabled.
Auto-indexing. The graph stays current through three mechanisms:
- Startup scan - indexes on server boot (async, non-blocking)
- File watcher - Chokidar monitors spec and source folders with a 2-second debounce, reindexing changed files in real time
- Lazy refresh -
code_graph_querycallsensureCodeGraphReady()which detects staleness and triggers a bounded inline refresh before returning results
The indexer uses tree-sitter to parse source files and extract functions, classes, imports and call relationships. It tracks per-file content hashes to skip unchanged files, making incremental scans fast.
Readiness & Response Contract
code_graph_query and code_graph_context share a readiness-aware response contract. When the graph is fresh enough, both return status: "ok" with resolved results plus a readiness / canonicalReadiness / trustState block. When readiness requires a full scan that cannot run inline, both return an explicit status: "blocked" payload naming requiredAction: "code_graph_scan", blockReason: "full_scan_required", degraded and graphAnswersOmitted instead of silently returning empty results. Callers should run code_graph_scan before retrying.
Success payloads of code_graph_context carry structured data.metadata.partialOutput (isPartial, reasons, omittedSections, omittedAnchors, truncatedText) and an explicit deadlineMs field so callers can distinguish a complete answer from one trimmed by deadline or budget pressure. code_graph_status exposes graphQualitySummary (detector provenance + edge-enrichment confidence). CALLS queries on ambiguous subjects (e.g. handle*) prefer callable implementation nodes over wrapper-shadow candidates and return ambiguity / selected-candidate metadata so callers can audit the choice.
Edge Explanations and Better Blast Radius
Relationship answers from code_graph_query include short reason and step fields alongside confidence and provenance, so you can see why an edge is there instead of just that it exists. code_graph_context carries those same fields through to structured edges and text briefs.
blast_radius keeps the prior payload (affected files, source files, hot files, multi-file union, depth) and adds:
depthGroups: affected nodes bucketed by how far they sit from the changeriskLevel:highwhen the subject is ambiguous or fans out to more than 10 things at depth one,mediumfor 4–10,lowotherwiseminConfidencefilter, drop traversals below a confidence floorambiguityCandidates: list of plausible matches when the subject can't be resolvedfailureFallback: structured info instead of a bare error string when resolution can't continue
All of this rides inside the existing code_edges.metadata JSON blob, no SQLite schema changes.
detect_changes: Preflight Impact Check
detect_changes is a read-only Code Graph tool that takes a diff and tells you which symbols and files it touches. It runs alongside code_graph_scan, code_graph_query, code_graph_status and code_graph_context.
You hand it { diff: string, rootDir?: string }. It walks each diff hunk, overlaps the line ranges with stored symbols and returns { status, affectedSymbols[], affectedFiles[], blockedReason?, timestamp, readiness }.
Safety is non-negotiable: the tool checks the graph is fresh before parsing the diff. If the graph is stale or unavailable, it returns status: 'blocked' immediately, so an out-of-date index never produces a false "nothing impacted" answer. Inline indexing is explicitly disabled here, so the read-only contract is enforced.
Under the hood the scan runner is split into four declared phases (find-candidates → parse-candidates → finalize → emit-metrics) for clearer instrumentation, with no SQLite schema changes.
The code graph runtime has its own feature catalog and operator playbook under system-code-graph/feature_catalog and system-code-graph/manual_testing_playbook. They document runtime features and manual scenarios for freshness, scan/verify/status, detect_changes, context retrieval, coverage graph, CCC and doctor-code-graph behavior.
What Each System Does
| System | Best for | Primary surface |
|---|---|---|
| CocoIndex | Concept search, similar implementations, unfamiliar modules | mcp__cocoindex_code__search |
| Code Graph | Callers, imports, symbol outlines, impact analysis, neighborhood expansion | mcp__mk_code_index__code_graph_*, mcp__mk_code_index__detect_changes, mcp__mk_code_index__ccc_* |
| Session bridge tools | Session bootstrap, resume and health checks around graph availability | session_bootstrap, session_resume, session_health |
| CCC utilities | CocoIndex availability, reindexing, result feedback | ccc_status, ccc_reindex, ccc_feedback |
How Query Routing Works (Graph-First)
The default routing order is: Code Graph (structural) -> CocoIndex (semantic code) -> Memory (session/decision context). This graph-first approach tries structural resolution before semantic similarity, with a 3-tier FTS fallback when earlier stages miss.
- Use the Code Graph first for structural questions: callers, callees, imports, hierarchy, file outlines and reverse impact.
- Use CocoIndex for semantic and intent-based questions: "find code that validates memory quality", "show similar routing patterns", "where is the logic for X?"
- Use session tools when recovering or checking environment readiness, but treat
/speckit:resumeas the canonical operator-facing recovery surface. - Rebuild task continuity in this order:
handover.md->_memory.continuity-> canonical spec docs. - Use Memory after those packet-local sources when the question is about prior decisions, spec history, handovers or task continuity that still needs deeper retrieval.
Why It Matters
This split avoids forcing one search system to do everything poorly. Semantic search is good at resemblance. Structural search is good at relationships. Keeping both lets the framework move from "this code looks relevant" to "this is how it connects" without collapsing those concerns into a single noisy result set.
For the full code-graph tool and architecture reference, see the system-code-graph skill and system-code-graph README. Shared memory and lifecycle details stay in .opencode/skills/system-spec-kit/README.md.
🎯 Skill Advisor
The Skill Advisor matches what you type to the right skill before any tool runs. It is now a standalone MCP server named mk_skill_advisor, packaged under .opencode/skills/system-skill-advisor/mcp_server/. The server registers nine tools: eight on the public surface (four advisor_* tools for routing, freshness, rebuild and validation, plus four skill_graph_* tools for scan, query, status and graph validation), plus one internal propagation tool. A small Python compatibility shim still works as a fallback when the native path is unavailable.
How It Works
YOU TYPE: "use chrome-devtools to inspect a page"
│
▼
┌──────────────────────┐
1. │ NORMALIZE │ Clean up the prompt, never store
│ │ the raw text
└──────────┬───────────┘
▼
┌──────────────────────┐
2. │ 5-LANE FUSION │ Explicit author signals 0.42
│ │ Lexical match 0.28
│ │ Causal graph 0.13
│ │ Derived hints 0.12
│ │ Semantic evidence 0.05
└──────────┬───────────┘
▼
┌───────────────────────────────┐
│ 3. FRESHNESS + LIFECYCLE │ Is each candidate still alive?
│ │ live / stale / absent / archived
│ Reads SQLite skill graph │ with redirect metadata
│ + generated metadata │ Falls open on errors
└───────────────┬───────────────┘
▼
┌──────────────────────┐
4. │ VALIDATE + FILTER │ Apply confidence + uncertainty
│ │ thresholds, cache the trust
│ │ envelope
└──────────┬───────────┘
▼
┌──────────────────────┐
5. │ RENDER │ Either a one-line hook brief
│ │ or a JSON recommendation list
└──────────┬───────────┘
▼
RESULT:
advisor_recommend -> list of skill recommendations
hook adapter -> "Advisor: live, use ..."
shim fallback -> legacy JSON
Native Package Layout
.opencode/skills/system-skill-advisor/mcp_server/
├── bench/ benchmarks
├── compat/ stable compatibility entry for runtimes
├── handlers/ the nine MCP tool handlers (8 public + 1 internal)
├── lib/ scorer, normalizer, freshness, cache
├── schemas/ JSON + Zod schemas
├── tests/ test suite
└── tools/ tool registration
| Tool | What it does |
|---|---|
advisor_recommend | Recommends skills for a prompt with lane breakdown, lifecycle redirects and a freshness trust signal. Returns the workspace root and the effective thresholds it used. |
advisor_rebuild | Rebuilds the advisor skill graph when advisor_status reports stale, absent or unavailable state. force:true rebuilds even when live. |
advisor_status | Reports freshness, generation, trust state, lane weights, skill count, last scan time and background daemon status. |
advisor_validate | Runs measurement slices: corpus accuracy, holdout, parity, safety, latency. Surfaces the workspace root, effective thresholds, threshold semantics (aggregate vs runtime) and prompt-safe outcome counts (accepted / corrected / ignored). |
skill_graph_scan | Indexes skill metadata into the advisor-owned skill graph surface. |
skill_graph_query | Queries skill graph relationships such as dependencies, families, hubs, conflicts and subgraphs. |
skill_graph_status | Reports graph counts, families, categories, staleness, validation and database status. |
skill_graph_validate | Validates schema drift, broken edges, reciprocal symmetry and dependency-cycle issues. |
How Runtimes Talk To It
- Claude Code, Gemini CLI, Codex CLI: call prompt-time hook adapters under
.opencode/skills/system-spec-kit/mcp_server/hooks/. Codex CLI requires[features].codex_hooks = trueopt-in for native hooks. Copilot CLI uses file-based custom instructions for the startup-surface path only. - OpenCode: uses
.opencode/plugins/spec-kit-skill-advisor.jswithspec-kit-skill-advisor-bridge.mjs, which imports the stable compat entry under.opencode/skills/system-skill-advisor/mcp_server/compat/index.ts. - Codex cold starts: the Codex prompt hook emits a prompt-safe stale advisory plus
{"stale":true,"reason":"timeout-fallback"}when startup context times out. The smoke helper lives at freshness-smoke-check.ts. - Disable everywhere: set
SPECKIT_SKILL_ADVISOR_HOOK_DISABLED=1to turn off all prompt-time advisor surfaces. - Threshold contract at the prompt: confidence ≥ 0.8 and uncertainty ≤ 0.35 by default.
Validation and Testing
advisor_validate({"skillSlug":null})returns measured corpus / holdout / parity / safety / latency slices plus prompt-safe outcome totals.- Python compatibility regression harness: checked-in dataset and pass/fail totals are reported by
skill_advisor_regression.py. - Native package: 23 advisor test files, 167 tests.
- Manual testing playbook: 42 scenario files spanning native MCP tools, runtime hooks, the OpenCode plugin, compatibility controls, auto-indexing, lifecycle routing, scorer fusion and operator-state edge cases.
- Hook diagnostics write to bounded JSONL sinks under the temp metrics root. The validator reads those sinks back across processes.
Affordance Evidence
Callers can pass structured tool and resource hints, skillId, name, triggers[], category, dependsOn[], enhances[], siblings[], prerequisiteFor[], conflictsWith[], as affordance evidence. A normalizer strips URLs, emails, token-shaped fragments, control characters and instruction-shaped strings before the scorer sees anything. Free-form description text is ignored on purpose. Sanitized triggers feed the existing derived-hints lane at reduced weight. Normalized relations become temporary edges in the existing causal-graph lane reusing the standard relation multipliers (depends_on, enhances, siblings, prerequisite_for, conflicts_with). No new scoring lane, no new entity kind, no raw matched phrases in recommendation payloads, evidence labels stay as stable affordance:<skillId>:<index> identifiers.
For details, see the Skill Advisor README.
🎯 Skills Library
22 skills in .opencode/skills/, loaded on demand when Gate 2 matches a task (confidence >= 0.8 means the skill must be loaded).
DOCUMENTATION
system-spec-kit
- Mandatory orchestrator for all file modifications - activates automatically for any code file change
- Creates numbered spec folders with manifest templates rendered through Level contracts across 4 levels (1-3+)
- Integrates the 39-tool memory surface with constitutional-tier support, session bootstrap and hybrid 5-channel retrieval
- Manages the manifest template source, 20 validation rules, the spec-kit script suite and the feature-catalog / testing-playbook documentation surfaces
sk-doc
- Unified markdown specialist with DQI quality scoring (Structure 40%, Content 35%, Style 25%)
- HVR v0.210 compliance checking and component creation workflows (skills, agents, commands)
- Handles README templates, frontmatter validation, feature catalog authoring, install guide generation
CODE WORKFLOW
sk-code
- Multi-stack coding standards, references and assets: surface-aware patterns, checklists and verification recipes loaded per stack
- WEBFLOW stack: Webflow / vanilla HTML/CSS/JS animation projects (motion.dev, GSAP, Lenis, HLS, Swiper, FilePond), CDN deployment, Lighthouse/TBT/INP targets, browser verification
- OPENCODE stack:
.opencode/system code across JavaScript/CommonJS, TypeScript, Python, Shell, JSON/JSONC, MCP server code, agents, commands, skill files - Smart-routing internals auto-detect the active stack from CWD/target paths and library markers. Unsupported stacks (Go, React/Next.js, generic Node.js, React Native, Swift) trigger a disambiguation question
- 3 mandatory phases: implementation → testing/debugging → verification
sk-code-review
- Stack-agnostic code review baseline using
sk-codesurface evidence where applicable - Baseline always runs first: security checklist, correctness checklist, SOLID checklist, threat model
- Security and correctness minimums are mandatory and NEVER relaxed by surface-specific evidence. P0/P1/P2 findings.
sk-git
- Git workflow orchestrator coordinating 3 sub-skills
- git-worktree: workspace isolation, branch creation, parallel development
- git-commit: conventional commit format, staged change analysis, scope detection
- git-finish: PR creation via
gh pr create, branch cleanup, integration workflows
deep-research
- Autonomous research investigation system with iterative LEAF cycles
- Fresh context per iteration, externalized JSONL state, 3-signal convergence detection (Rolling Average + MAD Noise Floor + Coverage/Age)
- Semantic coverage graph with 7 relation types, question coverage tracking, sourceDiversity and evidenceDepth guards
- Progressive synthesis, negative knowledge preservation, quality guards (source diversity, focus alignment, weak-source checks)
- Fail-closed corruption handling, graph convergence fallback scoring, terminal stop metadata parsing
- Lifecycle modes:
new,resume,restart. Dispatched by/deep:start-research-loopcommand
deep-review
- Autonomous code quality auditing system with iterative LEAF cycles
- P0/P1/P2 severity-weighted findings across 4 dimensions (Correctness, Security, Traceability, Maintainability)
- 3-signal convergence model, P0 override blocks stop, adversarial self-check (Hunter/Skeptic/Referee)
- Binary quality gates (evidence, scope, coverage), graph-aware legal-stop checks, semantic coverage graph
- 9-section review report with PASS/CONDITIONAL/FAIL verdict
- Fail-closed corruption, claim-adjudication
finalSeverity, stale STOP veto auto-clearing - Lifecycle modes:
new,resume,restart. Dispatched by/deep:start-review-loopcommand
deep-loop-runtime
- Shared runtime infrastructure for deep-review + deep-research loop workflows (post-arc-118)
- Owns executor config, state safety, scoring, fallback routing, coverage-graph scripts, and
storage/deep-loop-graph.sqlite
CROSS-AI CLI
These skills let you run cross-CLI agent teams from any starting CLI. Whichever assistant you're talking to (Claude Code, Codex, Copilot, Gemini, OpenCode, raw shell), it can dispatch the other AI CLIs as specialist sub-tools, each one a one-shot non-interactive call that streams structured output back to the caller. The conducting AI stays in charge. The dispatched CLI handles the part it's best at and returns. Use this to compose a Gemini web search + Codex implementation + Claude review pipeline from inside any one of them.
Self-invocation guard: every skill refuses to call itself. A Claude Code session never dispatches
cli-claude-code, an OpenCode session never dispatchescli-opencode, etc. Cross-AI delegation only, no cycles.
cli-gemini
- Gemini CLI orchestrator. Use it for real-time web search via Google Search grounding (no other CLI skill has this) and for analyzing very large codebases (1M+ token context).
- Single model:
gemini-3.1-pro-preview.
cli-codex
- OpenAI Codex CLI orchestrator. Use it for code generation, diff-aware review (
/review), web browsing (--search) and screenshot analysis (--image). Supports session resume/fork, agent profiles and cost control via--max-budget-usd. - Default model:
gpt-5.5at medium reasoning, fast service tier.gpt-5.3-codexand other GPT-5.x variants available via override.
cli-claude-code
- Claude Code CLI orchestrator. Use it for extended thinking (chain-of-thought), surgical diff-based edits and JSON-schema-validated structured output. Ships with 9 built-in agents and session continuity.
- Three models:
claude-opus-4-6(deep reasoning),claude-sonnet-4-6(default, balanced),claude-haiku-4-5(fast/cheap).
cli-opencode
- OpenCode CLI orchestrator. Use it when the dispatched task needs the project's full plugin / skill / MCP / Spec Kit Memory runtime, a one-shot
opencode runboots every plugin inopencode.json, every skill under.opencode/skills/, every MCP server and the memory database. Also handles parallel detached sessions (--share --port Nfor ablation suites, worker farms) and cross-repo dispatch (--dir <path>). - Three providers:
github-copilot(default, withgpt-5.4default +claude-sonnet-4.6alternative),opencode-go(DeepSeek + GLM/Kimi/Qwen via gateway),deepseek(direct DeepSeek API).
cli-devin
- Devin CLI orchestrator. Use it to dispatch Cognition AI's autonomous-agent binary
devinfrom any sibling CLI session, with the family's only local-to-cloud handoff (the live session can migrate to a Cognition cloud VM that keeps working asynchronously and returns a PR). - Four-model preset:
swe-1.6default for context gathering, tool use and simple-to-medium well-defined code tasks.deepseek-v4primary for complex tasks.glm-5.1andkimi-k2.6as complex-task fallbacks (agentic / large-context shape respectively).
MCP INTEGRATION
system-code-graph
- First-class code-graph subsystem at
.opencode/skills/system-code-graph/ - Owns structural AST indexing, SQLite graph storage, readiness contracts,
detect_changesand CocoIndex bridge helpers - Current MCP server name:
mk_code_index. Client namespace:mcp__mk_code_index__*
system-skill-advisor
- Standalone Gate 2 skill routing package at
.opencode/skills/system-skill-advisor/ - Exposes advisor tools plus
skill_graph_*structural routing tools through themk_skill_advisorMCP server - Keeps advisor and skill-graph storage out of the memory server
mcp-code-mode
- MCP orchestration engine providing access to 200+ external tools through a single TypeScript interface
- Reduces context overhead by 98.7% by loading external tool schemas on demand
- Progressive tool loading - zero upfront cost, tools load on first use. Type-safe with autocomplete.
mcp-coco-index
- Semantic code search via vector embeddings (
sbert/nomic-ai/CodeRankEmbed768d code-tuned default, MIT, MPS auto-detect on Apple Silicon — promoted 2026-05-19). Stage 2 cross-encoder rerank viaQwen/Qwen3-Reranker-0.6B(Apache-2.0 default since 2026-05-20 per ADR-027; jina-reranker-v3 kept as opt-in fallback). Alternative embedders + rerankers registered atembedders/registered_embedders.py; swap runbook in INSTALL_GUIDE §4) - Natural-language discovery of code patterns and implementations across 28+ languages
- Two access modes: CLI (
ccc) for direct terminal use, MCP server for AI agent integration
mcp-chrome-devtools
- Chrome DevTools orchestrator with intelligent 2-mode routing
- CLI mode (
bdg) prioritized for speed - runs in terminal, supports Unix pipes, composable in CI/CD - MCP mode as fallback for multi-tool integration scenarios
OTHER
sk-prompt
- Prompt engineering specialist auto-selecting from 7 proven frameworks (RCAF, COSTAR, RACE, CIDI, TIDD-EC, CRISPE, CRAFT)
- DEPTH thinking methodology with 3-10 iteration rounds of progressive refinement
- CLEAR quality scoring: Clarity, Logic, Expression, Reliability (40+/50 pass threshold)
deep-ai-council
- Multi-seat planning council dispatching diverse AI reasoning seats for strategic decisions
- Cross-seat critique and convergence checks produce evidence-backed recommendations
- Packet-local artifact persistence via
ai-council/**output directory - Planning-only scope. Agent counterpart listed in the Agent Network section below
sk-prompt-small-model
- Sentinel skill for small-model optimization patterns. Discovery anchor only — routes operators to executor-owned pattern files instead of hosting logic
- Active dispatch matrix: SWE-1.6, DeepSeek-v4-pro, Kimi-k2.6, Qwen3.6, GLM-5.1 across
cli-devin+cli-opencode(DeepSeek API direct + opencode-go pool). Optional stubs for Claude Haiku and Gemini Flash references/pattern-index.mdmaps each pattern (context budget, output verification, permissions matrix, quota fallback, model profiles, tool scoring) to its canonical executor-owned location- Pool-aware quota fallback (different-pool target only; no same-pool retries). Frontier models (Opus, Sonnet, gpt-5.5) explicitly out of scope. Adopting Haiku/Gemini Flash is metadata-first via
sk-prompt/assets/model-profiles.json
deep-agent-improvement
- Evaluator-first 5-dimension scoring: structural, ruleCoherence, integration, outputQuality, systemFitness — with integration scanner that discovers every surface an agent touches (canonical, mirrors, commands, YAML, skills)
- Dynamic profile generator derives the scoring rubric from each agent's own rules; no hardcoded profiles needed
- Proposal-first: candidates written to packet-local runtime; canonical target untouched until guarded promotion (scoring + benchmark + repeatability + operator approval gates, with rollback support)
- Deterministic scoring (regex/string/file-existence; no LLM-as-judge) and plateau detection (3+ identical scores triggers stop)
🤖 Agent Network
11 custom specialist agents. Defined in .opencode/agents/ (source of truth), mirrored for Claude Code (.claude/agents/), Codex CLI (.codex/agents/) and Gemini CLI (.gemini/agents/) runtime surfaces. OpenCode and Copilot CLI use runtime-specific MCP and startup integration rather than a dedicated agent mirror.
Orchestrate
- Senior task commander with full authority over decomposition, delegation and quality evaluation
- Merges sub-agent outputs into one unified response with conflict resolution
- Read-only permissions - delegates implementation to other agents
- Single-hop delegation only (depth 2 max) to prevent runaway chains
Code
- Surface-aware code implementation specialist (write-capable LEAF,
mode: subagent,task: deny) - Delegates code surface detection to
sk-code. The agent body itself stays stack-agnostic and reads sk-code's emitted surface evidence at dispatch time - 7 dispatch modes: full implementation / surgical fix / refactor only / test add / scaffold new file / rename-move / dependency bump
- 5-dimension acceptance rubric (100 pts total): Correctness 30, Scope-Adherence 20, Verification-Evidence 20, Stack-Pattern-Compliance 15, Integration 15
- Builder → Critic → Verifier adversarial self-check on every completion claim (challenges
DONE, opposite axis from@review's Hunter/Skeptic/Referee which challenges findings) - Iron Law: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE FROM THE ACTUAL STACK. LOW confidence strictly blocks
DONE - Fail-closed verification, failure returns to orchestrator, no internal retry. BLOCKED-count circuit breaker (3× BLOCKED → orchestrator offers
@debug) - Compact RETURN line + structured body with
escalationclassifier (NONE / UNKNOWN_STACK / SCOPE_CONFLICT / LOW_CONFIDENCE / LOGIC_SYNC / VERIFY_FAIL) - Dispatched ONLY by
@orchestratevia convention-floor caller-restriction (description prose + body §0 dispatch gate + orchestrate.md routing entry. Not harness-enforced)
Context
- Memory-first retrieval specialist - always checks memory before codebase
- Search order:
match_triggers→memory_context→memory_search→ grep/glob - Returns structured Context Packages combining memory findings with codebase evidence
- Uses both CocoIndex semantic search and the 5-channel memory system. Read-only.
Review
- Code quality guardian with strict read-only permissions (cannot write or edit any file)
- Loads
sk-code-reviewbaseline first, then usessk-codesurface evidence for stack-specific standards (whatever surfacesk-codedetected) - Security and correctness minimums are mandatory and never relaxed by surface-specific evidence
- Produces findings-first severity analysis with quality scoring and pattern validation
Debug
- Fresh-perspective debugger that receives structured context handoff (not conversation history)
- Avoids inherited bias from failed prior attempts - use after 3+ failed debugging tries
- Systematic 5-phase methodology: Observe → Analyze → Hypothesize → Validate → Fix
- Writes
debug-delegation.mdwith root cause analysis and findings
Markdown
- Dedicated LEAF executor for the
/create:*command family (agent, sk-skill, feature-catalog, testing-playbook, folder_readme, changelog) plus scoped spec-doc and markdown authoring - Scope-gated by convention-level Phase 0 check. Refuses unscoped writes and nested delegation with canonical REFUSE wording
- Loads
sk-docskill on every invocation. Reads the per-command or document-appropriate template before writing - Deterministic 3-state output contract:
STATUS=OK PATH=<file>/STATUS=FAIL ERROR=<reason>/STATUS=CANCELLED - DQI score >=75 minimum reported in completion claim. HVR (Human Voice Rules) compliance enforced
- Runtime lifecycle docs should stay HVR-concise: link to the scripts runbook and active spec packet instead of duplicating long process-cleanup instructions in every README.
Prompt-Improver
- Prompt-escalation specialist for high-stakes external CLI invocations and other sensitive AI prompt work
- Selects the best-fit framework from
sk-prompt, applies DEPTH at the right energy level and validates the result with CLEAR - Returns a structured prompt package with
FRAMEWORK,CLEAR_SCORE,RATIONALE,ENHANCED_PROMPTandESCALATION_NOTES - Used by the CLI mirror-card pipeline and
/promptagent mode when complexity, compliance or stakeholder spread makes inline prompting too weak
AI Council
- Multi-strategy planning architect dispatching diverse AI vantage points and strategy lenses
- Seeks distinct reasoning strategies across multiple AIs (cli-codex, cli-gemini, cli-claude-code + native)
- Multi-round deliberation before recommending a plan. Planning-only (never modifies files)
- 5-dimension scoring rubric for strategy quality
Deep Research
- Autonomous research agent executing single LEAF (Loop, Externalize, Analyze, Finish) iterations
- State externalized via JSONL + strategy.md for pause/resume across sessions
- Loop orchestration managed by
/deep:start-research-loopcommand, not this agent - Has permission to write
research.mdandscratch/inside spec folders - 3-signal convergence model: Rolling Average (0.45), MAD Noise Floor (0.30), Coverage/Age (0.25) with 0.60 threshold
- Semantic coverage graph: each iteration emits
graphEventswith relation types (ANSWERS, SUPPORTS, CONTRADICTS, SUPERSEDES, DERIVED_FROM, COVERS, CITES) - Graph convergence guards: sourceDiversity (>= 0.4) and evidenceDepth (>= 1.5) block premature STOP
- Question coverage tracking computes answerCoverage ratio from ANSWERS edges
- Quality guards: source diversity, focus alignment and weak-source checks must pass before STOP
- Progressive synthesis:
research.mdupdated incrementally and finalized during synthesis - Negative knowledge: ruled-out directions and dead ends preserved as first-class outputs
- Lifecycle modes:
new,resume,restart(fork and completed-continue are deferred) - Fail-closed corruption handling: throws structured error before writing derived files when JSONL is corrupt
- Graph convergence fallback: scoring uses a numeric fallback when
blendedScoreis absent
Deep Review
- Autonomous code quality auditor using LEAF architecture for single review iterations
- Reviews code but NEVER modifies target files (read-only on code)
- Loop orchestration managed by
/deep:start-review-loopcommand, not this agent - Produces P0/P1/P2 severity-ranked findings with
file:lineevidence across 4 review dimensions (Correctness, Security, Traceability, Maintainability) - Severity-weighted convergence: P0 contributes weight 10.0, P1 contributes 5.0, P2 contributes 1.0. Refinements contribute 0.5x those weights
- 3-signal convergence model: Rolling Average (0.45), MAD Noise Floor (0.30), Dimension Coverage (0.25)
- P0 override: any new P0 finding forces at least one more iteration regardless of convergence math
- Adversarial self-check on P0 findings: Hunter/Skeptic/Referee triad before admission
- Binary quality gates: evidence (file:line backed), scope (stays inside declared target), coverage (all dimensions and cross-reference protocols complete)
- Graph-aware legal-stop checks using structural graph signals from
graphEvents - Semantic coverage graph with review-specific node types (DIMENSION, FILE, FINDING, EVIDENCE, REMEDIATION) and edge types (COVERS, EVIDENCE_FOR, IN_DIMENSION, CONTRADICTS, RESOLVES, CONFIRMS)
- 9-section review report with PASS/CONDITIONAL/FAIL verdict (FAIL on P0, CONDITIONAL on P1, PASS otherwise)
- Claim-adjudication
finalSeverityoverrides original severity in the findings registry - Fail-closed corruption handling: reducer refuses to write derived files when JSONL corruption is detected
- Lifecycle modes:
new,resume,restartwith typed JSONL lineage events
Agent Improver
- Proposal-only mutator for bounded agent improvement experiments
- Reads the target agent's charter, manifest and integration surface, then writes ONE candidate to a packet-local runtime area
- Never scores, promotes, benchmarks or edits canonical targets. The
/deep:start-agent-improvement-loopcommand loop handles those. - Loop orchestration: scan integration surfaces, generate dynamic profile, dispatch this agent, score candidate across 5 dimensions (structural, ruleCoherence, integration, outputQuality, systemFitness), reduce state, check stop conditions
⌨️ Commands
24 command entry points across 5 command groups plus root utilities. Each command is a Markdown entry point under .opencode/commands/**/*.md backed by a behavioral execution spec.
SPEC KIT
Plan --intake-only
- Standalone intake workflow that publishes
spec.md,description.jsonandgraph-metadata.json - Used directly for new packet setup and paired with
/speckit:planor/speckit:completewhenfolder_stateisno-spec,partial-folder,repair-modeorplaceholder-upgrade - Modes:
:auto,:confirm
Complete
- End-to-end workflow: intake/delegate → research → plan → implement → verify → save memory
- Smart-detects missing or unhealthy packet state and reuses the shared intake contract from
/speckit:plan --intake-only. Healthy folders continue without extra setup prompts - Modes:
:auto(fully autonomous),:confirm(pause at each step),:with-research(adds deep research) - After 3 failed implementation attempts, surface diagnostics and let the user dispatch
@debugvia the Task tool
Plan
- Planning-only workflow that authors
spec.md,plan.mdandtasks.mdwithout implementing - Reuses the shared intake contract from
/speckit:plan --intake-onlywhen the packet isno-spec,partial-folder,repair-modeorplaceholder-upgrade - Dispatches up to 4 parallel context agents for codebase exploration during planning
- Use when you need stakeholder review before coding. Modes:
:auto,:confirm
Implement
- Executes an existing plan - requires plan.md to already exist
- 9-step workflow covering task breakdown, implementation, testing and verification
- Modes:
:auto,:confirm
Resume
- Continues a previous session by auto-loading memory from the spec folder
- Presents session summary, shows progress against tasks.md
- Works after crashes, compactions or new sessions
Spec-first command chains
/speckit:plan --intake-only
├─► /speckit:plan -> /speckit:implement
├─► /deep:start-research-loop -> /speckit:plan
└─► /speckit:complete
└─► reuses the shared intake contract from /speckit:plan --intake-only when folder_state still needs intake
/deep:start-research-loop only enters that chain after a real spec.md exists. It follows spec_check_protocol.md for advisory-lock handling, folder_state classification and bounded generated-fence sync.
MEMORY
Save
- Updates packet continuity and supporting generated context artifacts via
generate-context.js - AI composes structured JSON with session summary, key decisions and findings
- Indexes immediately for future retrieval via
memory_save()ormemory_index_scan()
Search
- Unified retrieval and analysis entry point with intent-aware routing
- Supports epistemic baselines, causal graph traversal, ablation studies and dashboards
- Routes by intent:
add_feature,fix_bug,refactor,security_audit,understand,find_spec,find_decision
Learn
/memory:learnconstitutional memory manager for always-surface rules- Constitutional memories carry a 3.0x boost and never decay
- Lifecycle operations: create, list, edit, remove, budget
Manage
- Database admin: stats (memory counts, index health), health checks, cleanup (orphaned vectors)
- Checkpoint management: create, list, restore, delete
- Bulk operations and ingestion (start/status/cancel)
CREATE
Skill
- Unified skill creation and update workflow
- Creates SKILL.md with 8-section structure, README.md, references and assets directories
- Registers in skill catalog. Modes:
:auto,:confirm
Agent
- Scaffolds a new agent definition with proper frontmatter, behavioral rules and tool permissions
- Creates source-of-truth file in
.opencode/agents/and mirrors for Claude, Codex, Gemini runtimes - Modes:
:auto,:confirm
Readme
- Unified README and install guide creation using sk-doc quality standards
- Auto-detects folder type, loads appropriate template, validates via DQI scoring
- Structure 40%, Content 35%, Style 25%. Modes:
:auto,:confirm
Changelog
- Auto-detects recent work from spec folder artifacts or git history
- Resolves correct component folder, calculates next version number
- Generates formatted changelog file matching 370+ existing entries. Modes:
:auto,:confirm
Feature Catalog
- Creates or updates feature catalog packages with category routing
- Generates both technical reference entries and simple-terms companion entries
- Validates against the 290-entry catalog structure across 22 categories
Testing Playbook
- Creates or updates manual testing playbook packages
- Generates scenario files with test steps, expected results and verification evidence fields
- Validates against established playbook format
The MCP server also ships explicit stress and matrix execution surfaces. Run npm run stress from mcp_server/ for the dedicated stress_test/ suite, which covers search-quality, memory, skill-advisor, code-graph, session and matrix subsystems. matrix_runners/ provides four per-CLI adapters plus a manifest and meta-runner for the F1-F14 feature matrix across cli-codex, cli-gemini, cli-claude-code and cli-opencode.
DEEP
AI Council
- Multi-seat planning and strategy workflow for complex decisions
- Produces packet-local
ai-council/**artifacts, critique rounds and convergence evidence - Planning-only: never modifies implementation files directly
- Modes:
:auto,:confirm
Deep Research
- Autonomous research loop dispatching deep-research agents iteratively until convergence
- Anchors every run to a real
spec.mdunderspec_check_protocol.md, with advisory lock handling,folder_statedetection and boundedBEGIN/END GENERATEDwrite-back - Externalized JSONL state enables pause/resume across sessions
- Reducer parses terminal
synthesis_completeevents for authoritative stop metadata - Graph convergence guards block premature STOP when sourceDiversity or evidenceDepth thresholds fail
- Lifecycle modes:
new,resume,restartwith lineage tracking across generations - Modes:
:auto,:confirm
Deep Review
- Autonomous code review loop dispatching deep-review agents iteratively until convergence
- Severity-weighted findings (P0/P1/P2) across 4 dimensions with release readiness verdicts (PASS/CONDITIONAL/FAIL)
- Claim-adjudication packets with
finalSeverityoverride, stale STOP veto auto-clearing - Binary quality gates (evidence, scope, coverage) checked after convergence math before allowing stop
- Adversarial self-check on P0 findings using Hunter/Skeptic/Referee triad
- Lifecycle modes:
new,resume,restartwith typed JSONL lineage events - Modes:
:auto,:confirm
Agent Improvement
- Evaluates and improves any agent across 5 integration-aware dimensions with deterministic scoring
- Runs a bounded loop: scan integration surfaces, generate dynamic profile, dispatch
@deep-agent-improvement, score candidate, reduce state, check stop conditions - Integration scanner discovers all surfaces an agent touches: canonical definition, runtime mirrors, command dispatch, YAML workflows, skill references
- Dynamic profiling: derives scoring rubric from any agent's own rules, no hardcoded profiles needed
- Proposal-first: candidates written to packet-local runtime areas, canonical target untouched until guarded promotion
- Guarded promotion requires passing scoring, benchmark status, repeatability evidence and operator approval. Rollback restores pre-promotion backup.
- Dimensional progress tracking detects plateau (3+ identical scores across all dimensions) and triggers stop
- All scoring is regex/string/file-existence based (no LLM-as-judge) for promotion gate reliability
- Emits
legal_stop_evaluatedandblocked_stopevents to the JSONL ledger matching the deep-loop runtime-truth contract - Session-boundary gate enforces fresh-session isolation before initialization
- Modes:
:auto,:confirm. Supports any agent in.opencode/agents/as target
DOCTOR
Three commands cover every spec-kit diagnostic surface. Run /doctor with no target to see the interactive menu. Upgrade users see "Update everything to match latest release" as option 1.
/doctor <target> (router)
- Single entry point for 7 subsystems:
memory,causal-graph,code-graph,deep-loop,cocoindex,skill-advisor,skill-budget - Argv-positional dispatch via
.opencode/commands/doctor/_routes.yamlmanifest (canonical per-target metadata: setup vars, allowed flags, mutation class, MCP tools, advisor trigger phrases) - Each target loads its own self-contained YAML workflow under
assets/doctor_<target>.yaml - Interactive menu when no target supplied. Tier 2 per-target prompt when a required flag is missing
- Examples:
/doctor memory --dry-run,/doctor causal-graph --confidence-threshold=0.8,/doctor code-graph --scope=stale --target=<name>is preserved as a compatibility alias for flag-only invocation
/doctor:mcp install|debug
- MCP infrastructure repair (replaces the standalone
/doctor:mcp_installand/doctor:mcp_debugfrom v3.4.0.0) install. Fresh install or reinstall of the native MCP servers from their install guides. Handles old-conflicting-with-new (clean reinstall with venv/node_modules removal)debug. Diagnoses the native MCP servers (Spec Kit Memory, System Skill Advisor, System Code Graph, CocoIndex Code, Code Mode, Sequential Thinking) with PASS/WARN/FAIL per check. Supports--fixfor guided repair
/doctor:update
- Multi-subsystem orchestrator: dependency-safe rebuild across code-graph → context-index + vector-index → causal-edges → skill-graph → advisor → deep-loop → cocoindex → eval
- One lock (
mcp_server/database/.doctor-update.flock), one pre-mutation snapshot set, one dependency DAG, one rollback policy, one state log (.doctor-update.last-run.json) - Tier-aware mid-run prompts: SHORT steps auto-acknowledge. MEDIUM steps share one combined prompt (Q-MED). LONG-POLE
memory_index_scangets explicit ETA prompt (Q-LONG, 5-15 min) - Additional gates: Q-PROBE (active MCP clients warning, NOT suppressed by
--force), Q-LEGACY (per-file cleanup with--cleanup-legacy), Q-FAIL (step-failure recovery) - Use after upgrading spec-kit, after large packet moves or when multiple subsystem doctors would otherwise need to run by hand. Pass
--migrateto handle schema migration (e.g. v3.3.0.0 → v3.4.1.0). Wall-clock 8-25 min
The 10 underlying YAML workflows in .opencode/commands/doctor/assets/ are self-sufficient. Each declares its own role/purpose/action/operating_mode/invariants/upstream_assets/user_inputs/field_handling block plus phased execution. The route-validate.{sh,py} CI script enforces internal consistency on the route manifest.
UTILITY
Agent Router
- Routes requests to external AI systems (Gemini CLI, Codex CLI, Claude Code, Copilot CLI)
- The receiving AI operates under its own system prompt - full identity adoption
- Use for cross-AI delegation where the target AI needs to behave as itself
Prompt
- Refines prompts and prompt packages through
/promptusing 7 proven frameworks (RCAF, COSTAR, RACE, CIDI, TIDD-EC, CRISPE, CRAFT) - Applies DEPTH thinking methodology with CLEAR quality scoring
- Can return inline improvements or route to
@prompt-improverfor higher-stakes prompt packages
🔌 Code Mode MCP
Code Mode MCP gives the AI access to external tools (Figma, GitHub, Chrome DevTools, ClickUp, Webflow) through a single TypeScript execution interface. Instead of loading large external tool definitions into context, Code Mode loads them on demand through one interface (1.6k tokens) - a 98.7% reduction.
Native MCP Servers
Canonical native server set:
| Server | Tools | Purpose |
|---|---|---|
mk-spec-memory | 39 | Cognitive memory, session recovery, causal/eval tools and graph loops |
mk_skill_advisor | 9 | Gate 2 advisor routing plus skill-graph scan/query/status/validation |
mk_code_index | 11 | Structural code graph, detect_changes and CocoIndex bridge helpers |
code_mode | 7 | External tool orchestration via TypeScript execution |
cocoindex_code | 2 | Semantic code search via vector embeddings |
sequential_thinking | 1 | Structured multi-step reasoning for complex problems |
| Total | 69 |
Lifecycle guardrails: mk-spec-memory, mk_skill_advisor, and mk_code_index use the shared idle-timeout knob SPECKIT_LAUNCHER_IDLE_TIMEOUT_MIN. Orphan cleanup is documented in .opencode/scripts/README.md; the checked-in LaunchAgent is only a template until an operator copies and loads it.
Code Mode Tools (7)
search_tools- Discover relevant tools by task descriptiontool_info- Get complete tool parameters and TypeScript interfacecall_tool_chain- Execute TypeScript code with access to all registered toolslist_tools- List all currently registered tool namesregister_manual- Register a new tool providerderegister_manual- Remove a tool providerget_required_keys_for_tool- Check required environment variables for a tool
External Integrations (via .utcp_config.json)
chrome_devtools_1(MCP/stdio) - Browser automation (instance 1). No env var needed.chrome_devtools_2(MCP/stdio) - Browser automation (instance 2). No env var needed.clickup(MCP/stdio) - Task management, goals, docs. RequiresCLICKUP_API_KEY.figma(MCP/stdio) - Design files, components, exports. RequiresFIGMA_API_KEY.github(MCP/stdio) - Issues, pull requests, commits. RequiresGITHUB_PERSONAL_ACCESS_TOKEN.webflow(MCP/remote) - Sites, CMS collections. Requires Webflow auth.
Performance
| Metric | Without Code Mode | With Code Mode |
|---|---|---|
| Context tokens | Large external tool schemas loaded upfront | 1.6k (on-demand) |
| Round trips | 15+ for chained operations | 1 (TypeScript chain) |
| Type safety | None | Full TypeScript |
| Context reduction | - | 98.7% |
To call a Code Mode tool: call_tool_chain({ typescript: "const result = await figma.figma_get_file({fileKey: 'abc123'}); return result;" })
For more on the mcp-code-mode skill and TypeScript execution patterns, see the skill at .opencode/skills/mcp-code-mode/SKILL.md.
4. CONFIGURATION
<a id="customizing-for-your-stack"></a>
🎯 Customizing for Your Stack: Start with sk-code
This repo ships as a public template. Of the skills it ships with, only one carries stack-specific content, start there:
| Skill / Surface | Out-of-the-box | Notes |
|---|---|---|
sk-code | 🎨 Stack-specific (the customization point) | Surface-aware code-quality patterns. Replace the shipped Webflow + OpenCode + Motion.dev surfaces with your own (e.g., Next.js + Tailwind + Postgres or React Native + Reanimated or Go + sqlc, etc.). |
sk-doc | ✅ Codebase-agnostic | Markdown quality + component creation. Works for any project. |
sk-git | ✅ Codebase-agnostic | Worktree + commit + PR workflow. Works for any project. |
sk-code-review | ✅ Codebase-agnostic baseline | Pulls surface evidence FROM sk-code. Customize sk-code and the review baseline auto-adapts. |
system-spec-kit | ✅ Codebase-agnostic | Spec folder workflow + validator + memory. Works for any project. |
mcp-coco-index | ✅ Codebase-agnostic | Semantic code search via embeddings. Works for any project. |
mcp-code-mode | ✅ Codebase-agnostic | Multi-tool MCP orchestration. Works for any project. |
deep-loop-runtime / deep-research / deep-review | ✅ Codebase-agnostic | Shared runtime plus iterative loop protocols. Work for any topic / target. |
sk-prompt / deep-agent-improvement | ✅ Codebase-agnostic | Prompt + agent improvement frameworks. Work for any project. |
cli-* (codex/copilot/gemini/claude-code/opencode) | ✅ Codebase-agnostic | External CLI orchestrators. Stack-independent. |
mcp-chrome-devtools | ✅ Codebase-agnostic | Browser tooling. Stack-independent. |
Adding your own skills: the shipped set is intentionally minimal, most teams will add their own skills (project-specific workflows, ops runbooks, domain-specific reviewers, etc.). That's expected and supported. Just drop them into .opencode/skills/<your-skill>/ and they'll be picked up by the advisor. The shipped skills above are kept agnostic so upstream updates apply cleanly to your fork.
What "adapting sk-code" looks like:
- Replace
references/webflow/,references/opencode/,references/motion_dev/with your stack's references (e.g.,references/nextjs/,references/postgres/). - Replace
assets/webflow/,assets/opencode/,assets/motion_dev/with your stack's assets (checklists, recipes, snippets). - Update
SKILL.md§2 Smart Routing,STACK_FOLDERSdict + the bash detection block, to match your stack's marker files and CWD signals. - Update the
RESOURCE_MAPintent → file paths to point at your renamed references/assets. - Bump
sk-codeversion + ship a changelog. Use theassets/opencode/checklists/skill_authoring.mdchecklist as your guide.
The other shipped skills will continue working unchanged: sk-doc will still validate your markdown, sk-git will still manage your branches, system-spec-kit will still spec your work and sk-code-review will surface YOUR sk-code evidence at review time.
Core Configuration Files
CLAUDE.md- Gate definitions, behavior rules, coding anti-patterns. Used by Claude Code (primary runtime).AGENTS.md- Agent routing, capability reference, gate documentation. Used by all runtimes.opencode.json- MCP server bindings, model configuration and launcher notes. Used by OpenCode platform..utcp_config.json- Code Mode external tool registrations. Used bymcp-code-modeskill..claude/mcp.json- Claude Code MCP configuration. Claude Code only..codex/config.toml- Codex CLI MCP configuration and profile definitions..gemini/settings.json- Gemini CLI configuration. Gemini CLI only..vscode/mcp.json- VS Code / Copilot MCP configuration wrapper.
Memory Engine Configuration
The memory server reads configuration from environment variables:
VOYAGE_API_KEY(optional) - Voyage AI cloud embeddings (opt-in only, gated by egress guard)SPECKIT_EMBEDDER(optional) - Override the default embedder id (default:ollama-nomic-v1.5since ADR-013/014 2026-05-19; was previouslyollama-jina-v3). See embedder-pluggability.md for the registered list.SPECKIT_RERANK_LAYER(optional) - Retrieval-rescue layer toggle, defaulttrueper ADR-011. Set tofalseto disable.HF_EMBEDDINGS_DTYPE(optional) - hf-local fallback dtype (default:q8. Also:fp32,fp16,q4,int8,uint8,bnb4)OPENAI_API_KEY(optional) - OpenAI embeddings (alternative)MEMORY_DB_PATH(optional) - Override default database path
Default repo-local database path: .opencode/skills/system-spec-kit/mcp_server/database/context-index__ollama__nomic-embed-text-v1.5__768.sqlite (default since ADR-013/014 2026-05-19; previously __jina-embeddings-v3__1024__q4_k_m.sqlite). The filename encodes provider, model, dimension and dtype so multiple backends can coexist on disk without mixing vectors.
[!TIP] If no API key is set, the memory engine auto-detects the local Ollama endpoint serving nomic-embed-text-v1.5 (current default per ADR-013/014), falls back to jina-embeddings-v3 if nomic isn't pulled, then to HuggingFace Local embeddings.
Memory Feature Flags
Feature flags control search channels, scoring signals, save-time enforcement and evaluation behavior. The important retrieval/runtime flags are resolved at call time, so long-lived MCP processes do not depend on frozen import-time snapshots.
- Search Pipeline - 5-channel retrieval, fallback routing, reranking, graph-walk rollout, confidence and token-budget policies.
- Session/Cache - Working memory, cache invalidation on DB rebind, session deduplication, recovery helpers.
- Memory/Storage - Save quality gate, reconsolidation, governed scopes, causal graph maintenance, projection cleanup.
- Runtime Lifecycle - MCP idle self-exit through
SPECKIT_LAUNCHER_IDLE_TIMEOUT_MIN; orphan sweeper rollout remains dry-run-first until explicitly installed. - Embedding/API - Startup provider resolution, fail-fast dimension checks, structured fallback metadata for effective vs requested provider.
- Evaluation/Debug - Trace mode, eval logging, ablation/reporting guardrails, feedback evaluation and proposal diagnostics that observe candidates without reordering live results.
For the complete flag reference with per-flag defaults, see ENV_REFERENCE.md and the MCP Server runtime guardrail notes.
Database Schema
The runtime centers on a SQLite memory_index table with 56 columns plus companion FTS5/vector, lineage, checkpoint, working-memory and eval tables.
- Primary store -
memory_indexholds the searchable memory rows plus governance, quality, chunking and retrieval metadata. - Search companions - FTS5 and vector tables support lexical and embedding retrieval alongside BM25 rebuild/index data.
- Graph/lifecycle - Causal edges, lineage projection, checkpoints, working memory and access tracking support decision tracing and session continuity.
- Evaluation - Separate eval tables persist ablation/reporting metrics, with guards for missing query IDs and synthetic token-usage markers.
- Paths - The checked-in configs default to the provider-keyed database path under
.opencode/skills/system-spec-kit/mcp_server/database/. The filename encodes provider, model, dimension and dtype (current default since ADR-013/014:context-index__ollama__nomic-embed-text-v1.5__768.sqlite; jina-v3 fallback would producecontext-index__ollama__jina-embeddings-v3__1024__q4_k_m.sqlite). If a runtime cannot write inside the repo, overrideMEMORY_DB_PATH(and, when relevant,SPEC_KIT_DB_DIR) to a writable location.
MCP Config Shape
Abbreviated shape. Runtime config files can temporarily differ while the mk_code_index rename is being rolled out across clients. The canonical code-graph identity is mk_code_index / mcp__mk_code_index__*.
{
"mcp": {
"mk-spec-memory": {
"type": "local"
},
"mk_skill_advisor": {
"type": "local"
},
"mk_code_index": {
"type": "local"
},
"code_mode": {
"type": "local"
},
"cocoindex_code": {
"type": "local"
},
"sequential_thinking": {
"type": "local"
}
}
}
Maintainer-Mode Code-Graph Flags (already disabled for end users)
All 5 runtime MCP configs (opencode.json, .claude/mcp.json, .codex/config.toml, .gemini/settings.json, .vscode/mcp.json) carry five opt-in maintainer flags:
SPECKIT_CODE_GRAPH_INDEX_SKILLS (covers .opencode/skills/**)
SPECKIT_CODE_GRAPH_INDEX_AGENTS (covers .opencode/agents/**)
SPECKIT_CODE_GRAPH_INDEX_COMMANDS (covers .opencode/commands/**)
SPECKIT_CODE_GRAPH_INDEX_SPECS (covers <active-spec-folder>/**)
SPECKIT_CODE_GRAPH_INDEX_PLUGINS (covers .opencode/plugins/**)
SPECKIT_CODE_GRAPH_DB_DIR (optional code-graph SQLite directory override)
End users see all 5 as "false" thanks to the git clean filter. That's the framework default and what you want, the code graph indexes your project code, not the framework backend.
Maintainers (us) have all 5 as "true" locally because we navigate .opencode/ to iterate on the framework. The smudge filter restores "true" on checkout/pull/clone after running ./scripts/setup-maintainer-filters.sh.
Per-call override: the same five flags exist as includeSkills / includeAgents / includeCommands / includeSpecs / includePlugins arguments on code_graph_scan. Per-call args always override env defaults, so you can flip behavior for one scan without editing config.
<a id="git-clean-filter--maintainer-mode-stays-local"></a>
Git clean filter: maintainer mode stays local
The repo ships a .gitattributes rule that runs an idempotent sed-based clean filter on the 5 config files: every "true" for these flags is rewritten to "false" when the file enters the git index. The smudge filter rewrites "false" → "true" on checkout/pull/clone for installed maintainers. Net effect:
- End users cloning the template → all 5 configs show
"false"(framework default, correct out of box) - Maintainers after running
./scripts/setup-maintainer-filters.sh→ all 5 configs show"true"locally. Commits + pushes still ship"false"to the remote
To opt into maintainer mode on a fresh clone (only relevant if you're contributing upstream):
./scripts/setup-maintainer-filters.sh
git rm --cached opencode.json .claude/mcp.json .gemini/settings.json .vscode/mcp.json .codex/config.toml
git checkout -- opencode.json .claude/mcp.json .gemini/settings.json .vscode/mcp.json .codex/config.toml
After that, cat opencode.json shows "true". git show HEAD:opencode.json shows "false" (what the remote sees).
5. FAQ
Q: Do I need all 22 skills installed to use the framework?
A: No. Skills are loaded on demand by Gate 2. You only need the ones relevant to your work. The two core documentation skills - system-spec-kit and sk-doc - cover most documentation workflows. The MCP and cross-AI CLI skills require additional local tooling or API keys depending on the surface.
Q: Is this only for OpenCode or does it work with other runtimes?
A: It works with OpenCode, Codex CLI, Claude Code and Gemini CLI. The repo also includes Copilot CLI-oriented startup-surface integration. Agent definitions are mirrored in the checked-in Claude, Codex and Gemini runtime directories. OpenCode and Copilot CLI use runtime-specific MCP or startup integration rather than a dedicated agent mirror. Q: What happens if I do not use a spec folder?
A: Gate 3 blocks file modifications until a spec folder answer is provided. You can skip it with option D, but skipped sessions are undocumented and will not be recoverable via memory search. For trivial changes under 5 characters in a single file, Gate 3 does not trigger. Q: How does the memory system know what is relevant to my current task?
A: Packet continuity and any supporting generated context artifacts use structured frontmatter and anchored markdown so the memory engine can classify, index and retrieve them reliably. For recovery, start with /speckit:resume and the packet-local continuity ladder handover.md -> _memory.continuity -> canonical spec docs. After that, memory_match_triggers() runs a fast trigger/cognitive pass, while memory_context() and memory_search() handle deeper retrieval with intent routing, reranking and filtering.
Q: Can I use this framework without the cognitive memory features?
A: Yes. The Spec Kit documentation workflow (Gate 3, spec folders, templates) works independently of the memory MCP server. You lose cross-session memory retrieval, but structured documentation, agent routing and skill loading all still work. Q: How do I add a new skill to the framework?
A: Use /create:skill to scaffold the skill structure. The command creates the SKILL.md, references and assets directories following the sk-doc template. Then register the skill in .opencode/skills/README.md.
Q: What does "local-first" mean for the memory system?
A: The memory database is a SQLite file on your local machine. No session data, code or context is sent to any external service unless you configure a cloud embedding provider (Voyage AI or OpenAI). HuggingFace Local embeddings run entirely on-device. Q: How do I contribute a new agent definition?
A: Define the agent in .opencode/agents/ (the source of truth), then mirror the adapter into .claude/agents/, .codex/agents/ and .gemini/agents/. Use /create:agent to scaffold the file from the agent template.
Q: How many MCP tools are there and where are they defined?
A: 69 total across 6 native MCP servers, sourced from registered MCP-dispatched tools only. Breakdown: 39 mk-spec-memory tools from .opencode/skills/system-spec-kit/mcp_server/tool-schemas.ts, 9 mk_skill_advisor tools from .opencode/skills/system-skill-advisor/mcp_server/advisor-server.ts, 11 mk_code_index tools from .opencode/skills/system-code-graph/mcp_server/tool-schemas.ts, 7 code mode tools, 2 cocoindex_code tools and 1 sequential thinking tool. Canonical advisor/skill-graph docs use mk_skill_advisor / mcp__mk_skill_advisor__*. Canonical code-graph docs use mk_code_index / mcp__mk_code_index__*.
Q: What is the feature catalog?
A: The feature catalog is the current technical reference documenting the memory system's live capabilities. It lives at .opencode/skills/system-spec-kit/feature_catalog/feature_catalog.md. The code graph runtime adds package-local docs at .opencode/skills/system-code-graph/feature_catalog/.
6. RELATED DOCUMENTS
Internal Documentation:
- → AGENTS.md - Agent routing, gate definitions, behavior rules
- → Spec Kit README - Spec folder workflow, Level contract template set, validation rules
- → MCP Server README - Memory API reference and runtime support docs
- → Repo Scripts Runbook - Dry-run orphan MCP sweeper, Claude cleanup, and LaunchAgent template guidance
- → Orphan MCP Leak Prevention Packet - Canonical implementation summary and rollout state
- → System Code Graph Skill - First-class structural graph skill and MCP routing rules
- → Skill Advisor README - Standalone
mk_skill_advisorserver, nine advisor/skill-graph tools and routing docs - → Install Guide - MCP server setup, embedding providers
- → Deployment Notes - Docker anti-patterns, Copilot notes and session-resume auth flag
- → Architecture - API boundary contract
- → sk-doc Skill - Documentation standards, DQI scoring
- → Skills Index - Skills library and invocation patterns
- → Feature Catalog - Current technical reference
- → Manual Testing Playbook - Operator validation scenarios, including runtime lifecycle checks
- → Code Graph Runtime Catalog - Package-local code graph runtime inventory
- → Code Graph Manual Playbook - Operator scenarios for code graph validation
- → Latest System Spec-Kit Release Notes - Most recent shipped release notes
External Resources:
- → OpenCode - The underlying AI coding platform
- → Voyage AI - Cloud embedding provider (opt-in)
- → HuggingFace - Free local embedding alternative
Documentation version: 4.13 | Last updated: 2026-05-24 | Framework: 11 agents, 22 skills, 24 commands, 69 MCP tools (39 mk-spec-memory + 9 mk_skill_advisor + 11 mk_code_index + 7 code mode + 2 CocoIndex + 1 sequential thinking. Deferred / internal-only handlers do NOT count).