name: eval-patterns description: | This skill provides common evaluation patterns and integration guidance. Use when:

Integrating eval-framework with other plugins
Designing evaluation workflows
Choosing between content vs behavior evaluation
Setting up project-local rubrics version: 1.0.0

Evaluation Patterns & Integration

Common patterns for using the eval-framework effectively in different contexts.

Evaluation Types

Content Evaluation

Evaluates static content: copy, documentation, code files.

Use for:

Marketing copy review
Documentation quality
Code style/patterns
Configuration validation

Invocation:

/eval-run brand-voice app/routes/sell-on-vouchline.tsx

Behavior Evaluation

Evaluates actions and outputs: what Claude did, not just what exists.

Use for:

Code review after implementation
Commit message quality
Test coverage verification
API response validation

Invocation:

Judge agent triggered: "Review what I just implemented against the code-security rubric"

Combined Evaluation

Evaluates both content and behavior together.

Use for:

Full code review (style + security + behavior)
Documentation with examples (accuracy + completeness)
Feature implementation review

Project-Local Setup

Directory Structure

your-project/
├── .claude/
│   └── evals/
│       ├── brand-voice.yaml      # Project rubrics
│       ├── code-security.yaml
│       └── api-design.yaml

Quick Setup

Create directory: mkdir -p .claude/evals
Create rubric: /eval-create brand-voice --from docs/brand/voice.md
Run evaluation: /eval-run brand-voice

Rubric Discovery

The judge agent automatically discovers rubrics in:

.claude/evals/*.yaml (project-local)
.claude/evals/*.yml (alternate extension)
Explicit paths passed to commands

Integration Patterns

Pattern 1: Post-Implementation Review

After completing significant work, invoke judge for quality check:

User: "I just finished the authentication module"
Claude: [Uses judge agent to evaluate against code-security rubric]

The judge agent's when_to_use description enables proactive triggering after code review requests.

Pattern 2: Command-Based Validation

Explicit validation during development:

/eval-run brand-voice app/routes/sell-on-vouchline.tsx

Returns structured feedback before committing.

Pattern 3: Plugin Integration

Other plugins can invoke the judge programmatically:

## In your plugin's agent/command:

Invoke the eval-framework judge agent with:
- Rubric: [name or path]
- Content: [what to evaluate]
- Context: [additional context]

The judge will return structured evaluation results.

Pattern 4: Pre-Commit Workflow

Manual pre-commit check (not automated hook):

User: "Check my changes before I commit"
Claude: [Runs relevant rubrics against staged files]

Choosing Rubrics

By Content Type

Content	Recommended Rubric
Marketing copy	brand-voice
API code	code-security, api-design
Documentation	docs-quality
Test files	test-coverage
Config files	config-validation

By Quality Gate

Gate	Threshold	Required Criteria
Draft review	60%	None
PR review	75%	Core criteria
Production	85%	All security

Rubric Composition

Layered Rubrics

Create focused rubrics that can be run together:

# code-style.yaml - formatting, naming
# code-security.yaml - vulnerabilities
# code-perf.yaml - performance patterns

Run multiple: /eval-run code-style && /eval-run code-security

Domain-Specific Rubrics

Create rubrics for specific features:

# auth-flow.yaml - authentication patterns
# payment-handling.yaml - financial code
# user-input.yaml - input validation

Best Practices

Start Simple: Begin with 2-3 criteria, add more as needed.

Iterate Rubrics: Version your rubrics and refine based on false positives/negatives.

Context Matters: Include file patterns in scope to auto-filter relevant files.

Required vs Optional: Use required_criteria for must-pass items, let others contribute to score.

Actionable Feedback: Every check message should tell how to fix the issue.

Troubleshooting

Rubric not found: Check .claude/evals/ exists and rubric name matches file.

False positives: Refine regex patterns or use custom checks for nuance.

Score too low: Review thresholds - they might be too strict for your context.

Slow evaluation: Reduce custom checks (LLM-evaluated) where pattern checks work.

Reference Files

See references/ for additional patterns:

integration-examples.md - Real-world integration examples

Name	eval-patterns
Description	This skill provides common evaluation patterns and integration guidance. Use when: - Integrating eval-framework with other plugins - Designing evaluation workflows - Choosing between content vs behavior evaluation - Setting up project-local rubrics

eval-patterns

SKILL.md

Evaluation Patterns & Integration

Evaluation Types

Content Evaluation

Behavior Evaluation

Combined Evaluation

Project-Local Setup

Directory Structure

Quick Setup

Rubric Discovery

Integration Patterns

Pattern 1: Post-Implementation Review

Pattern 2: Command-Based Validation

Pattern 3: Plugin Integration

Pattern 4: Pre-Commit Workflow

Choosing Rubrics

By Content Type

By Quality Gate

Rubric Composition

Layered Rubrics

Domain-Specific Rubrics

Best Practices

Troubleshooting

Reference Files