name: build-extension-builder description: "Build Claude cognitive extensions with quality methodology. Composes with plugin-dev and agent-sdk-dev. Use when: creating skills/hooks/agents/commands/MCP/plugins, need quality validation, building for the 1337 marketplace."

Extension Builder

Build cognitive extensions that enable effective collaboration, where both human and Claude grow through the partnership.

Requires: plugin-dev@claude-plugins-official and agent-sdk-dev@claude-plugins-official for authoritative schemas and templates. This skill adds quality methodology on top.

Why This Matters

Extensions become part of how users think and work. The difference between helpful and harmful comes down to how it's built.

Good extensions:

Show reasoning (user learns WHY, not just WHAT)
Provide control (user shapes direction)
Fill gaps (what Claude doesn't already know)
Compound value (each enhancement makes the next easier)

Bad extensions:

Hide reasoning (black box)
Replace thinking (user just consumes output)
Repeat basics (bloat without insight)
Create dependency (user less capable without it)

Design Principles

Build these into every extension.

Transparency (β = 0.415 effect)

Make reasoning visible so users can verify and learn.

Pattern	Implementation
Show the claim	What you're recommending
Show the why	Reasoning behind it
Show alternatives	What you considered and rejected
Show the source	Where this comes from
Show uncertainty	Confidence level (1-10)

Example in a skill:

### Error Handling

Use `thiserror` for library errors, `anyhow` for applications.

**Why:** thiserror derives std::error::Error with zero runtime cost.
anyhow provides context chaining but hides the error type.

**Source:** Rust API Guidelines, tokio/reqwest usage patterns.

Control (β = 0.507 effect, strongest)

Give users agency over direction.

Pattern	Implementation
Decision frameworks	Teach HOW to decide, not WHAT to do
Tradeoff tables	Options with tradeoffs, user chooses
Approval gates	Stop before irreversible actions
Checkpoints	Verifiable steps in complex workflows

Example decision framework:

### Which Error Type?

| Context | Use | Why |
|---------|-----|-----|
| Library (public API) | thiserror | Callers need to match on error types |
| Application (internal) | anyhow | Context matters more than type |
| Both (lib + binary) | thiserror + anyhow | Export typed errors, use anyhow internally |

Pit of Success

Make the right thing the only obvious path.

Structure your extension so correct behavior is natural:

Default to safe options
Make dangerous operations require extra steps
Use constraints, not documentation

Mistake-Proofing (Poka-Yoke)

Catch errors where they originate.

Validate assumptions early
Surface uncertainty at decision points
Include "watch out for" sections

Non-Conformist by Design

Extensions that offer templates converge. Extensions that teach process diverge.

Selective (Converges)	Generative (Diverges)
"Pick style A, B, or C"	"What approach fits your context?"
Templates to apply	Framework to discover
Menu of options	Dialogue to articulate
Everyone gets similar output	Each user develops unique voice

The homogenization trap: When AI tools offer categorical choices, everyone picks from the same menu. Output converges toward sameness.

The generative alternative: Help users discover and crystallize their own approach. The skill teaches the process, not the product.

Wrong	Right
Skill prescribes THE answer	Skill helps user find THEIR answer
Template library	Discovery framework
"Use this pattern"	"Here's how to find the right pattern"

Crystallization pattern:

skill helps user discover → user articulates their approach →
approach becomes local skill → collaboration uses that vocabulary

The published skill is the fishing rod. Each user catches their own fish.

Observability

Make extension behavior visible and controllable by default.

OTel Instrumentation

Instrument extensions so behavior is measurable and debuggable.

Extension Type	OTel	Key Spans
Agents	Required	`agent_run`, `llm_call`, `tool_call`
MCP Servers	Required	`mcp_server`, `mcp_call`
SDK Apps	Required	`session`, `turn`, `tool_call`
Skills	Recommended	`skill_check`, `skill_match`, `skill_load`
Hooks	Recommended	`hook_trigger`, `hook_handler`
Commands	Recommended	`command`, `command_execute`

Minimum attributes to capture:

success (bool), duration_ms (int), error (string if failed)
For LLM calls: input_tokens, output_tokens, model
For tool calls: tool_name, tool_args (truncated)

Local-first tracing:

# Phoenix (local, no cloud required)
import phoenix as px
px.launch_app()  # localhost:6006

from opentelemetry import trace
tracer = trace.get_tracer("my-extension")

See observability.md for complete instrumentation patterns.

Hook Behavior

Hooks fall into two categories with different design patterns:

Hook Type	Purpose	Pattern
Validation	Review actions before/after	Suggest, don't block
Action-triggering	Detect patterns, cause response	Directive, cause action

Validation hooks (PreToolUse, most PostToolUse):

Suggest alternatives, let user proceed with original choice.

# Good: Shows alternative, lets user proceed
{"decision": "allow", "message": "Consider using rg instead of grep (faster). Proceeding with grep."}

# Bad: Removes choice without escape
{"decision": "block", "message": "Use rg instead."}

Action-triggering hooks (pattern detection):

When detecting conditions that should trigger a response (debugging loops, user frustration, security concerns), use directive language that causes action.

# Good: Directive that causes action
{"decision": "allow", "message": "🐺 DEBUGGING LOOP DETECTED (3 consecutive failures). You MUST now: 1) Tell the user what's happening. 2) Spawn the appropriate agent to handle this systematically."}

# Bad: Mere suggestion that gets ignored
{"decision": "allow", "message": "Consider using Mr. Wolf for this problem."}

Why the distinction matters: Validation hooks preserve user agency over individual actions. Action-triggering hooks respond to emergent patterns where the whole point is to interrupt the current approach — suggesting doesn't accomplish that.

Opt-out mechanism: Every hook-based extension must:

Document how to disable
Respect environment variables (e.g., SKIP_HOOKS=1)
Never hard-block without escape hatch

Reasoning traces: When hooks modify behavior, show:

What triggered the hook
What the hook recommends (or requires)
Why (brief reasoning)
For validation hooks: how to proceed with original if desired

Five Extension Types

type	purpose	what it extends
skill	knowledge + decision frameworks	what Claude knows
hook	event-triggered actions	session behavior
agent	specialized subagent	reasoning delegation
command	workflow shortcuts	repeatable procedures
mcp	external system integration	reach beyond Claude

Building a Skill

Skills are the most common extension. Follow Anthropic's patterns.

Structure

skill-name/
├── SKILL.md           (required - < 500 lines)
├── references/        (detailed docs, load as needed)
├── scripts/           (executable code)
└── assets/            (templates, files for output)

SKILL.md Anatomy

Frontmatter (required):

---
name: skill-name
description: "What it does. Use when: specific triggers."
---

The description is the trigger. Claude reads this to decide when to load. Be specific about "Use when:".

Intent-Driven Activation

Trigger on user intent, not tool names.

Wrong	Right
"Use when: Midjourney prompting"	"Use when: creating artwork, images, visual assets"
"Use when: using pytest"	"Use when: writing tests, test-driven development"
"Use when: running kubectl"	"Use when: deploying to Kubernetes, managing clusters"

Why this matters:

Users think in goals ("I need an image"), not tools ("I need Midjourney")
Intent-driven activation catches synonyms and related tasks
Tool-specific activation misses obvious use cases

The pattern:

[Domain/activity] + [user goals/outcomes]
NOT
[Tool name] + [tool-specific actions]

Good examples:

"Visual content creation with AI. Use when: creating artwork, images, illustrations, animations, videos, aesthetic direction."
"Engineering excellence for builders. Use when: writing code, making technical decisions, refactoring, reviewing."
"Rust production patterns. Use when: building Rust CLI, backend, frontend, or native apps."

Body (required):

Brief intro (1-2 sentences)
Why this approach (practical motivation, not academic)
Core content (decision tables, workflows, gotchas)
References section (what to load when)

What Goes in SKILL.md vs References

SKILL.md	references/
High-level workflow	Detailed patterns
Decision frameworks	Full examples
"Load X when Y" navigation	Academic/industry citations
Practical motivation	Research foundations
< 500 lines	No limit

Key insight: SKILL.md is pragmatic and motivating. References are where depth lives.

The Filter

Claude already knows this? → YES → Cut it
Non-obvious insight? → NO → Cut it

include	cut
Production gotchas	Basic syntax
Decision frameworks	Textbook examples
Corrects assumptions	Generic explanations
What Claude gets wrong	Complete tutorials

Progressive Disclosure

Skills share context with everything else. Treat tokens as a public good.

Metadata (~100 words) - Always loaded, triggers activation
SKILL.md body (< 500 lines) - Loaded when skill activates
References (unlimited) - Loaded when Claude needs them

Reference each file clearly:

## References

| need | load |
|------|------|
| Python patterns | [python.md](references/python.md) |
| Error handling | [errors.md](references/errors.md) |

Building Other Extension Types

For comprehensive templates and schema documentation, use the official Claude Code plugins:

building...	use skill
plugin structure	`/plugin-dev:plugin-structure`
skill	`/plugin-dev:skill-development`
hook	`/plugin-dev:hook-development`
agent	`/plugin-dev:agent-development`
command	`/plugin-dev:command-development`
mcp server	`/plugin-dev:mcp-integration`
sdk app	`/agent-sdk-dev:new-sdk-app`

These official skills are authoritative and always up-to-date with Claude Code.

Our references add 1337-specific patterns and quality methodology (see References section below).

Validation Checklist

Before shipping:

Content Quality

Fills gaps (what Claude doesn't know)
Decisions, not tutorials
Each claim has source (in references)
Tested in real session

Transparency Built-In

Reasoning visible for recommendations
Sources cited or source types named
Uncertainty acknowledged where relevant
Alternatives considered and shown

Control Built-In

Decision frameworks, not mandates
Tradeoffs presented for significant choices
User can shape direction
Approval gates for irreversible actions (if applicable)

Observability Built-In

OTel spans defined (agents/MCP/SDK: required; skills/hooks/commands: recommended)
Key attributes captured (success, duration_ms, error)
Traces route to local collector (Phoenix or OTLP)
Validation hooks suggest, don't block (user retains choice)
Action-triggering hooks use directive language (cause action, not suggestions)
Opt-out mechanism documented (for hooks)
No silent enforcement

Activation

Description has "Use when:"
Triggers on intent (user goals), not tool names
Triggers on right prompts
Doesn't over-activate

Quality

Expert finds this useful
User MORE capable after using
Passes the pit of success test

Non-Conformist

Teaches process, not product
Users develop their own approach
No categorical templates that converge

Publishing

For 1337 marketplace:

Create plugin in plugins/<name>-1337/
Add to .claude-plugin/marketplace.json
Add display metadata to .claude-plugin/metadata.json

See marketplace-schema.md for schema details.

Quality Assurance

After building an extension, validate it through the eval→optimize cycle.

Quick Evaluation

"Evaluate plugins/my-extension-1337"

The evaluator agent checks all 6 quality gates and returns a verdict:

1337: ≥15/18, no gate below 2, no critical issues → ready to ship
NEEDS WORK: ≥10/18, fixable issues → run optimizer
NOT READY: <10/18 or fundamental problems → rethink approach

Optimization

If evaluator returns NEEDS WORK:

"Optimize plugins/my-extension-1337 based on the evaluation"

The optimizer agent:

Fixes issues in priority order (critical → major → minor)
Applies minimal changes (surgical, not sweeping)
Escalates domain decisions to you
Reports what was fixed and what needs human input

Full Quality Loop

For hands-off tuning:

"Run quality loop on plugins/my-extension-1337 until it passes"

This runs eval→optimize→re-eval cycles (max 3 iterations) until the extension reaches 1337 status or escalates issues that need human decisions.

When to Run

Situation	Action
Just built new extension	Run evaluator
Evaluator says NEEDS WORK	Run optimizer
After optimizer fixes	Re-run evaluator
Want full automated tuning	Run quality loop
Auditing existing plugins	Run evaluator on each

See plugin-tuning-runbook.md for detailed step-by-step execution guide.

References

Official Claude Code Skills (Authoritative)

Use these official skills for comprehensive, up-to-date documentation:

need	skill	what you get
Plugin structure	`plugin-dev:plugin-structure`	Directory layout, manifest, auto-discovery
Skill development	`plugin-dev:skill-development`	SKILL.md format, frontmatter, progressive disclosure
Hook development	`plugin-dev:hook-development`	Events, matchers, scripts, JSON schemas
Agent development	`plugin-dev:agent-development`	Agent frontmatter, triggers, examples
Command development	`plugin-dev:command-development`	Slash commands, arguments, dynamic context
MCP integration	`plugin-dev:mcp-integration`	Server types, config, tool design
SDK apps	`agent-sdk-dev:new-sdk-app`	Claude Agent SDK patterns

1337-Specific References

Our references add quality methodology and marketplace specifics:

need	load
OTel instrumentation	observability.md
Evidence workflow	evidence-templates.md
Skill evaluation metrics	evals.md
Plugin manifest gotchas	plugin-schema.md
1337 marketplace	marketplace-schema.md

Methodology Depth

Research foundations live in core-1337. Load that skill for:

Software craftsmanship principles
Evidence hierarchy
Scientific method
Collaborative intelligence theory

Name	build-extension-builder
Description	Build Claude cognitive extensions with quality methodology. Composes with plugin-dev and agent-sdk-dev. Use when: creating skills/hooks/agents/commands/MCP/plugins, need quality validation, building for the 1337 marketplace.