Agent Skill
2/7/2026

eval-patterns

This skill provides common evaluation patterns and integration guidance. Use when: - Integrating eval-framework with other plugins - Designing evaluation workflows - Choosing between content vs behavior evaluation - Setting up project-local rubrics

T
therealchrisrock
0GitHub Stars
1Views
npx skills add therealchrisrock/basho

SKILL.md

Nameeval-patterns
DescriptionThis skill provides common evaluation patterns and integration guidance. Use when: - Integrating eval-framework with other plugins - Designing evaluation workflows - Choosing between content vs behavior evaluation - Setting up project-local rubrics

name: eval-patterns description: | This skill provides common evaluation patterns and integration guidance. Use when:

  • Integrating eval-framework with other plugins
  • Designing evaluation workflows
  • Choosing between content vs behavior evaluation
  • Setting up project-local rubrics version: 1.0.0

Evaluation Patterns & Integration

Common patterns for using the eval-framework effectively in different contexts.

Evaluation Types

Content Evaluation

Evaluates static content: copy, documentation, code files.

Use for:

  • Marketing copy review
  • Documentation quality
  • Code style/patterns
  • Configuration validation

Invocation:

/eval-run brand-voice app/routes/sell-on-vouchline.tsx

Behavior Evaluation

Evaluates actions and outputs: what Claude did, not just what exists.

Use for:

  • Code review after implementation
  • Commit message quality
  • Test coverage verification
  • API response validation

Invocation:

Judge agent triggered: "Review what I just implemented against the code-security rubric"

Combined Evaluation

Evaluates both content and behavior together.

Use for:

  • Full code review (style + security + behavior)
  • Documentation with examples (accuracy + completeness)
  • Feature implementation review

Project-Local Setup

Directory Structure

your-project/
├── .claude/
│   └── evals/
│       ├── brand-voice.yaml      # Project rubrics
│       ├── code-security.yaml
│       └── api-design.yaml

Quick Setup

  1. Create directory: mkdir -p .claude/evals
  2. Create rubric: /eval-create brand-voice --from docs/brand/voice.md
  3. Run evaluation: /eval-run brand-voice

Rubric Discovery

The judge agent automatically discovers rubrics in:

  1. .claude/evals/*.yaml (project-local)
  2. .claude/evals/*.yml (alternate extension)
  3. Explicit paths passed to commands

Integration Patterns

Pattern 1: Post-Implementation Review

After completing significant work, invoke judge for quality check:

User: "I just finished the authentication module"
Claude: [Uses judge agent to evaluate against code-security rubric]

The judge agent's when_to_use description enables proactive triggering after code review requests.

Pattern 2: Command-Based Validation

Explicit validation during development:

/eval-run brand-voice app/routes/sell-on-vouchline.tsx

Returns structured feedback before committing.

Pattern 3: Plugin Integration

Other plugins can invoke the judge programmatically:

## In your plugin's agent/command:

Invoke the eval-framework judge agent with:
- Rubric: [name or path]
- Content: [what to evaluate]
- Context: [additional context]

The judge will return structured evaluation results.

Pattern 4: Pre-Commit Workflow

Manual pre-commit check (not automated hook):

User: "Check my changes before I commit"
Claude: [Runs relevant rubrics against staged files]

Choosing Rubrics

By Content Type

ContentRecommended Rubric
Marketing copybrand-voice
API codecode-security, api-design
Documentationdocs-quality
Test filestest-coverage
Config filesconfig-validation

By Quality Gate

GateThresholdRequired Criteria
Draft review60%None
PR review75%Core criteria
Production85%All security

Rubric Composition

Layered Rubrics

Create focused rubrics that can be run together:

# code-style.yaml - formatting, naming
# code-security.yaml - vulnerabilities
# code-perf.yaml - performance patterns

Run multiple: /eval-run code-style && /eval-run code-security

Domain-Specific Rubrics

Create rubrics for specific features:

# auth-flow.yaml - authentication patterns
# payment-handling.yaml - financial code
# user-input.yaml - input validation

Best Practices

Start Simple: Begin with 2-3 criteria, add more as needed.

Iterate Rubrics: Version your rubrics and refine based on false positives/negatives.

Context Matters: Include file patterns in scope to auto-filter relevant files.

Required vs Optional: Use required_criteria for must-pass items, let others contribute to score.

Actionable Feedback: Every check message should tell how to fix the issue.

Troubleshooting

Rubric not found: Check .claude/evals/ exists and rubric name matches file.

False positives: Refine regex patterns or use custom checks for nuance.

Score too low: Review thresholds - they might be too strict for your context.

Slow evaluation: Reduce custom checks (LLM-evaluated) where pattern checks work.

Reference Files

See references/ for additional patterns:

  • integration-examples.md - Real-world integration examples
Skills Info
Original Name:eval-patternsAuthor:therealchrisrock