Agent Skill
2/7/2026

skill-eval

Evaluate all workspace-hub skills for structural validity, content quality, cross-reference integrity, and registry consistency. Runs 18 checks across critical, warning, and info severity levels with actionable fix suggestions.

V
vamseeachanta
3GitHub Stars
1Views
npx skills add vamseeachanta/workspace-hub

SKILL.md

Nameskill-eval
DescriptionEvaluate all workspace-hub skills for structural validity, content quality, cross-reference integrity, and registry consistency. Runs 18 checks across critical, warning, and info severity levels with actionable fix suggestions.

name: skill-eval description: Evaluate all workspace-hub skills for structural validity, content quality, cross-reference integrity, and registry consistency. Runs 18 checks across critical, warning, and info severity levels with actionable fix suggestions. version: 1.0.0 category: development last_updated: 2026-01-29 capabilities:

  • structural_validation
  • content_quality_analysis
  • cross_reference_integrity
  • report_generation tools:
  • Bash
  • Read
  • Glob
  • Grep related_skills:
  • skill-creator
  • compliance-check
  • repository-health-analyzer
  • verification-loop requires: [] see_also: []

Skill Eval

Quick Start

# Full evaluation of all skills
uv run .claude/skills/development/skill-eval/scripts/eval-skills.py

# JSON output
uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --format json

# Single skill
uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --skill testing-tdd-london

# Only critical issues
uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --severity critical

When to Use

  • Auditing skill quality after bulk creation or migration
  • Pre-release validation of the skills library
  • CI/CD quality gate for skill changes
  • Identifying broken cross-references after renaming skills
  • Checking compliance with v2 SKILL.md template format

Core Concepts

Evaluation Dimensions

The evaluator runs 18 checks organized into three severity levels:

Critical (blocks skill usage):

  • YAML frontmatter exists and parses
  • Required fields present: name, description

Warning (degrades quality):

  • Version follows semver, category matches directory
  • Required content sections present (Quick Start, When to Use, etc.)
  • Code blocks in key sections, no TODO/FIXME markers
  • Cross-references in related_skills resolve to real skills

Info (improvement opportunities):

  • Uses v2 template format, has optional sections (Metrics, etc.)
  • No duplicate skill names across the library

Report Output

Reports include:

  • Summary with pass/fail counts and percentages
  • Issues grouped by severity with counts
  • Per-category breakdown
  • Top 10 most common issues
  • Per-skill details with actionable fix suggestions

Usage Examples

Full Evaluation

uv run .claude/skills/development/skill-eval/scripts/eval-skills.py

Output:

================================================================
  SKILL EVALUATION REPORT
  Generated: 2026-01-29T14:30:00+00:00
================================================================

SUMMARY
----------------------------------------
  Total skills evaluated:  230
  Passed (no critical):    187  (81.3%)
  Warnings only:           102  (44.3%)
  Critical failures:        43  (18.7%)

JSON for CI/CD

uv run .claude/skills/development/skill-eval/scripts/eval-skills.py \
  --format json --severity critical \
  --output reports/skill-eval.json

Filter by Category

uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --category development

Summary Only

uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --summary-only

Best Practices

  • Run after creating new skills with /skill-creator to validate structure
  • Use --format json in CI pipelines for machine-readable output
  • Address critical issues first (missing frontmatter, invalid YAML)
  • Use --severity warning to focus on actionable improvements
  • Run --category filters for focused audits of specific skill areas
  • If skills or index markdown changes, regenerate derived skills summary artifacts: uv run --no-project python scripts/skills/generate-skill-summary.py

Error Handling

Exit CodeMeaning
0All skills pass (no critical issues)
1Critical failures found
2Script error (missing directory, invalid arguments)
Common IssueCauseFix
frontmatter_missingSkill uses legacy heading formatAdd --- delimited YAML frontmatter
yaml_invalidSyntax error in frontmatterFix YAML syntax (check colons, indentation)
related_skill_unresolvedReferenced skill name doesn't existCorrect the name or remove the reference
section_missingMissing required H2 sectionAdd the section heading and content

Metrics & Success Criteria

MetricTarget
Skills passing all critical checks100%
Skills with complete v2 sections>80%
Resolved cross-references100%
No TODO/FIXME in skills100%

Version History

  • 1.0.0 (2026-01-29): Initial release with 18 checks, human + JSON output, category/skill filtering
Skills Info
Original Name:skill-evalAuthor:vamseeachanta