Agent Skill
2/7/2026

build-extension-builder

Build Claude cognitive extensions with quality methodology. Composes with plugin-dev and agent-sdk-dev. Use when: creating skills/hooks/agents/commands/MCP/plugins, need quality validation, building for the 1337 marketplace.

Y
yzavyas
4GitHub Stars
1Views
npx skills add yzavyas/claude-1337

SKILL.md

Namebuild-extension-builder
DescriptionBuild Claude cognitive extensions with quality methodology. Composes with plugin-dev and agent-sdk-dev. Use when: creating skills/hooks/agents/commands/MCP/plugins, need quality validation, building for the 1337 marketplace.

name: build-extension-builder description: "Build Claude cognitive extensions with quality methodology. Composes with plugin-dev and agent-sdk-dev. Use when: creating skills/hooks/agents/commands/MCP/plugins, need quality validation, building for the 1337 marketplace."

Extension Builder

Build cognitive extensions that enable effective collaboration, where both human and Claude grow through the partnership.

Requires: plugin-dev@claude-plugins-official and agent-sdk-dev@claude-plugins-official for authoritative schemas and templates. This skill adds quality methodology on top.

Why This Matters

Extensions become part of how users think and work. The difference between helpful and harmful comes down to how it's built.

Good extensions:

  • Show reasoning (user learns WHY, not just WHAT)
  • Provide control (user shapes direction)
  • Fill gaps (what Claude doesn't already know)
  • Compound value (each enhancement makes the next easier)

Bad extensions:

  • Hide reasoning (black box)
  • Replace thinking (user just consumes output)
  • Repeat basics (bloat without insight)
  • Create dependency (user less capable without it)

Design Principles

Build these into every extension.

Transparency (β = 0.415 effect)

Make reasoning visible so users can verify and learn.

PatternImplementation
Show the claimWhat you're recommending
Show the whyReasoning behind it
Show alternativesWhat you considered and rejected
Show the sourceWhere this comes from
Show uncertaintyConfidence level (1-10)

Example in a skill:

### Error Handling

Use `thiserror` for library errors, `anyhow` for applications.

**Why:** thiserror derives std::error::Error with zero runtime cost.
anyhow provides context chaining but hides the error type.

**Source:** Rust API Guidelines, tokio/reqwest usage patterns.

Control (β = 0.507 effect, strongest)

Give users agency over direction.

PatternImplementation
Decision frameworksTeach HOW to decide, not WHAT to do
Tradeoff tablesOptions with tradeoffs, user chooses
Approval gatesStop before irreversible actions
CheckpointsVerifiable steps in complex workflows

Example decision framework:

### Which Error Type?

| Context | Use | Why |
|---------|-----|-----|
| Library (public API) | thiserror | Callers need to match on error types |
| Application (internal) | anyhow | Context matters more than type |
| Both (lib + binary) | thiserror + anyhow | Export typed errors, use anyhow internally |

Pit of Success

Make the right thing the only obvious path.

Structure your extension so correct behavior is natural:

  • Default to safe options
  • Make dangerous operations require extra steps
  • Use constraints, not documentation

Mistake-Proofing (Poka-Yoke)

Catch errors where they originate.

  • Validate assumptions early
  • Surface uncertainty at decision points
  • Include "watch out for" sections

Non-Conformist by Design

Extensions that offer templates converge. Extensions that teach process diverge.

Selective (Converges)Generative (Diverges)
"Pick style A, B, or C""What approach fits your context?"
Templates to applyFramework to discover
Menu of optionsDialogue to articulate
Everyone gets similar outputEach user develops unique voice

The homogenization trap: When AI tools offer categorical choices, everyone picks from the same menu. Output converges toward sameness.

The generative alternative: Help users discover and crystallize their own approach. The skill teaches the process, not the product.

WrongRight
Skill prescribes THE answerSkill helps user find THEIR answer
Template libraryDiscovery framework
"Use this pattern""Here's how to find the right pattern"

Crystallization pattern:

skill helps user discover → user articulates their approach →
approach becomes local skill → collaboration uses that vocabulary

The published skill is the fishing rod. Each user catches their own fish.

Observability

Make extension behavior visible and controllable by default.

OTel Instrumentation

Instrument extensions so behavior is measurable and debuggable.

Extension TypeOTelKey Spans
AgentsRequiredagent_run, llm_call, tool_call
MCP ServersRequiredmcp_server, mcp_call
SDK AppsRequiredsession, turn, tool_call
SkillsRecommendedskill_check, skill_match, skill_load
HooksRecommendedhook_trigger, hook_handler
CommandsRecommendedcommand, command_execute

Minimum attributes to capture:

  • success (bool), duration_ms (int), error (string if failed)
  • For LLM calls: input_tokens, output_tokens, model
  • For tool calls: tool_name, tool_args (truncated)

Local-first tracing:

# Phoenix (local, no cloud required)
import phoenix as px
px.launch_app()  # localhost:6006

from opentelemetry import trace
tracer = trace.get_tracer("my-extension")

See observability.md for complete instrumentation patterns.

Hook Behavior

Hooks fall into two categories with different design patterns:

Hook TypePurposePattern
ValidationReview actions before/afterSuggest, don't block
Action-triggeringDetect patterns, cause responseDirective, cause action

Validation hooks (PreToolUse, most PostToolUse):

Suggest alternatives, let user proceed with original choice.

# Good: Shows alternative, lets user proceed
{"decision": "allow", "message": "Consider using rg instead of grep (faster). Proceeding with grep."}

# Bad: Removes choice without escape
{"decision": "block", "message": "Use rg instead."}

Action-triggering hooks (pattern detection):

When detecting conditions that should trigger a response (debugging loops, user frustration, security concerns), use directive language that causes action.

# Good: Directive that causes action
{"decision": "allow", "message": "🐺 DEBUGGING LOOP DETECTED (3 consecutive failures). You MUST now: 1) Tell the user what's happening. 2) Spawn the appropriate agent to handle this systematically."}

# Bad: Mere suggestion that gets ignored
{"decision": "allow", "message": "Consider using Mr. Wolf for this problem."}

Why the distinction matters: Validation hooks preserve user agency over individual actions. Action-triggering hooks respond to emergent patterns where the whole point is to interrupt the current approach — suggesting doesn't accomplish that.

Opt-out mechanism: Every hook-based extension must:

  • Document how to disable
  • Respect environment variables (e.g., SKIP_HOOKS=1)
  • Never hard-block without escape hatch

Reasoning traces: When hooks modify behavior, show:

  • What triggered the hook
  • What the hook recommends (or requires)
  • Why (brief reasoning)
  • For validation hooks: how to proceed with original if desired

Five Extension Types

typepurposewhat it extends
skillknowledge + decision frameworkswhat Claude knows
hookevent-triggered actionssession behavior
agentspecialized subagentreasoning delegation
commandworkflow shortcutsrepeatable procedures
mcpexternal system integrationreach beyond Claude

Building a Skill

Skills are the most common extension. Follow Anthropic's patterns.

Structure

skill-name/
├── SKILL.md           (required - < 500 lines)
├── references/        (detailed docs, load as needed)
├── scripts/           (executable code)
└── assets/            (templates, files for output)

SKILL.md Anatomy

Frontmatter (required):

---
name: skill-name
description: "What it does. Use when: specific triggers."
---

The description is the trigger. Claude reads this to decide when to load. Be specific about "Use when:".

Intent-Driven Activation

Trigger on user intent, not tool names.

WrongRight
"Use when: Midjourney prompting""Use when: creating artwork, images, visual assets"
"Use when: using pytest""Use when: writing tests, test-driven development"
"Use when: running kubectl""Use when: deploying to Kubernetes, managing clusters"

Why this matters:

  • Users think in goals ("I need an image"), not tools ("I need Midjourney")
  • Intent-driven activation catches synonyms and related tasks
  • Tool-specific activation misses obvious use cases

The pattern:

[Domain/activity] + [user goals/outcomes]
NOT
[Tool name] + [tool-specific actions]

Good examples:

  • "Visual content creation with AI. Use when: creating artwork, images, illustrations, animations, videos, aesthetic direction."
  • "Engineering excellence for builders. Use when: writing code, making technical decisions, refactoring, reviewing."
  • "Rust production patterns. Use when: building Rust CLI, backend, frontend, or native apps."

Body (required):

  1. Brief intro (1-2 sentences)
  2. Why this approach (practical motivation, not academic)
  3. Core content (decision tables, workflows, gotchas)
  4. References section (what to load when)

What Goes in SKILL.md vs References

SKILL.mdreferences/
High-level workflowDetailed patterns
Decision frameworksFull examples
"Load X when Y" navigationAcademic/industry citations
Practical motivationResearch foundations
< 500 linesNo limit

Key insight: SKILL.md is pragmatic and motivating. References are where depth lives.

The Filter

Claude already knows this? → YES → Cut it
Non-obvious insight? → NO → Cut it
includecut
Production gotchasBasic syntax
Decision frameworksTextbook examples
Corrects assumptionsGeneric explanations
What Claude gets wrongComplete tutorials

Progressive Disclosure

Skills share context with everything else. Treat tokens as a public good.

  1. Metadata (~100 words) - Always loaded, triggers activation
  2. SKILL.md body (< 500 lines) - Loaded when skill activates
  3. References (unlimited) - Loaded when Claude needs them

Reference each file clearly:

## References

| need | load |
|------|------|
| Python patterns | [python.md](references/python.md) |
| Error handling | [errors.md](references/errors.md) |

Building Other Extension Types

For comprehensive templates and schema documentation, use the official Claude Code plugins:

building...use skill
plugin structure/plugin-dev:plugin-structure
skill/plugin-dev:skill-development
hook/plugin-dev:hook-development
agent/plugin-dev:agent-development
command/plugin-dev:command-development
mcp server/plugin-dev:mcp-integration
sdk app/agent-sdk-dev:new-sdk-app

These official skills are authoritative and always up-to-date with Claude Code.

Our references add 1337-specific patterns and quality methodology (see References section below).


Validation Checklist

Before shipping:

Content Quality

  • Fills gaps (what Claude doesn't know)
  • Decisions, not tutorials
  • Each claim has source (in references)
  • Tested in real session

Transparency Built-In

  • Reasoning visible for recommendations
  • Sources cited or source types named
  • Uncertainty acknowledged where relevant
  • Alternatives considered and shown

Control Built-In

  • Decision frameworks, not mandates
  • Tradeoffs presented for significant choices
  • User can shape direction
  • Approval gates for irreversible actions (if applicable)

Observability Built-In

  • OTel spans defined (agents/MCP/SDK: required; skills/hooks/commands: recommended)
  • Key attributes captured (success, duration_ms, error)
  • Traces route to local collector (Phoenix or OTLP)
  • Validation hooks suggest, don't block (user retains choice)
  • Action-triggering hooks use directive language (cause action, not suggestions)
  • Opt-out mechanism documented (for hooks)
  • No silent enforcement

Activation

  • Description has "Use when:"
  • Triggers on intent (user goals), not tool names
  • Triggers on right prompts
  • Doesn't over-activate

Quality

  • Expert finds this useful
  • User MORE capable after using
  • Passes the pit of success test

Non-Conformist

  • Teaches process, not product
  • Users develop their own approach
  • No categorical templates that converge

Publishing

For 1337 marketplace:

  1. Create plugin in plugins/<name>-1337/
  2. Add to .claude-plugin/marketplace.json
  3. Add display metadata to .claude-plugin/metadata.json

See marketplace-schema.md for schema details.


Quality Assurance

After building an extension, validate it through the eval→optimize cycle.

Quick Evaluation

"Evaluate plugins/my-extension-1337"

The evaluator agent checks all 6 quality gates and returns a verdict:

  • 1337: ≥15/18, no gate below 2, no critical issues → ready to ship
  • NEEDS WORK: ≥10/18, fixable issues → run optimizer
  • NOT READY: <10/18 or fundamental problems → rethink approach

Optimization

If evaluator returns NEEDS WORK:

"Optimize plugins/my-extension-1337 based on the evaluation"

The optimizer agent:

  • Fixes issues in priority order (critical → major → minor)
  • Applies minimal changes (surgical, not sweeping)
  • Escalates domain decisions to you
  • Reports what was fixed and what needs human input

Full Quality Loop

For hands-off tuning:

"Run quality loop on plugins/my-extension-1337 until it passes"

This runs eval→optimize→re-eval cycles (max 3 iterations) until the extension reaches 1337 status or escalates issues that need human decisions.

When to Run

SituationAction
Just built new extensionRun evaluator
Evaluator says NEEDS WORKRun optimizer
After optimizer fixesRe-run evaluator
Want full automated tuningRun quality loop
Auditing existing pluginsRun evaluator on each

See plugin-tuning-runbook.md for detailed step-by-step execution guide.


References

Official Claude Code Skills (Authoritative)

Use these official skills for comprehensive, up-to-date documentation:

needskillwhat you get
Plugin structureplugin-dev:plugin-structureDirectory layout, manifest, auto-discovery
Skill developmentplugin-dev:skill-developmentSKILL.md format, frontmatter, progressive disclosure
Hook developmentplugin-dev:hook-developmentEvents, matchers, scripts, JSON schemas
Agent developmentplugin-dev:agent-developmentAgent frontmatter, triggers, examples
Command developmentplugin-dev:command-developmentSlash commands, arguments, dynamic context
MCP integrationplugin-dev:mcp-integrationServer types, config, tool design
SDK appsagent-sdk-dev:new-sdk-appClaude Agent SDK patterns

1337-Specific References

Our references add quality methodology and marketplace specifics:

needload
OTel instrumentationobservability.md
Evidence workflowevidence-templates.md
Skill evaluation metricsevals.md
Plugin manifest gotchasplugin-schema.md
1337 marketplacemarketplace-schema.md

Methodology Depth

Research foundations live in core-1337. Load that skill for:

  • Software craftsmanship principles
  • Evidence hierarchy
  • Scientific method
  • Collaborative intelligence theory
Skills Info
Original Name:build-extension-builderAuthor:yzavyas