Agent Skill
2/7/2026

ontos-skill-evaluator

Meta-skill by Ontos AI for evaluating Claude Skills quality. Use when you need to assess a SKILL.md file quality, validate its structure, detect common issues, or generate an evaluation report with actionable recommendations.

O
ontos
1GitHub Stars
1Views
npx skills add Ontos-AI/anything-skills

SKILL.md

Nameontos-skill-evaluator
DescriptionMeta-skill by Ontos AI for evaluating Claude Skills quality. Use when you need to assess a SKILL.md file quality, validate its structure, detect common issues, or generate an evaluation report with actionable recommendations.

name: ontos-skill-evaluator description: "Meta-skill by Ontos AI for evaluating Claude Skills quality. Use when you need to assess a SKILL.md file quality, validate its structure, detect common issues, or generate an evaluation report with actionable recommendations." license: MIT metadata: author: ontos-ai version: "1.0.0"

Ontos Skill Evaluator

A meta-skill by Ontos AI that evaluates other Claude Skills through systematic quality assessment.

Installation

npx skills add ontos-ai/skills-evaluator

Quick Start

Node.js (Recommended for skills.sh users)

node scripts/quick_eval.js <path-to-skill>
node scripts/quick_eval.js <path-to-skill> --format html

Python (For local development)

python scripts/quick_eval.py <path-to-skill>

Example:

node scripts/quick_eval.js ../output/skills/ai-agent-trend-analysis --format html

Evaluation Dimensions

1. Structure (20%)

CheckDescription
Valid YAML frontmatterParseable, no duplicates
Required fieldsname and description present
No illegal fieldsOnly name, description, optional license
Directory structureSKILL.md at root, proper subdirs

2. Trigger Quality (15%)

CheckDescription
Description triggersClear usage contexts in description
Trigger phrasesExplicit trigger examples in body
DiversityMultiple trigger variations

3. Actionability (25%)

CheckDescription
Concrete stepsNumbered or bulleted procedures
Tool referencesMentions scripts, APIs, or MCP tools
No vague languageAvoids "as needed", "if necessary" without context

4. Tool Integration (20%)

CheckDescription
Script referencesLinks to scripts/ files
Reference linksLinks to references/ docs
Asset usageProper paths to assets/

5. Example Quality (20%)

CheckDescription
Non-placeholderUses realistic data, not [PLACEHOLDER]
RelevanceExamples match skill purpose
Output formatClear expected output shown

Output

Evaluation generates a JSON report:

{
  "skill_id": "ai-agent-trend-analysis",
  "evaluated_at": "2026-01-28T21:00:00Z",
  "tier": "quick",
  "scores": {
    "overall": 0.72,
    "structure": 0.60,
    "triggers": 0.80,
    "actionability": 0.75,
    "tool_refs": 0.70,
    "examples": 0.75
  },
  "issues": [
    {"severity": "error", "code": "DUPLICATE_FRONTMATTER", "message": "..."},
    {"severity": "warning", "code": "VAGUE_INSTRUCTION", "line": 45, "message": "..."}
  ],
  "recommendations": ["Fix duplicate frontmatter", "Add concrete examples"],
  "badge": "silver"
}

Badge Levels

BadgeScore RangeMeaning
🥇 Gold≥0.85Production ready
🥈 Silver0.70-0.84Good with minor issues
🥉 Bronze0.50-0.69Needs improvement
❌ Fail<0.50Critical issues

Advanced Usage

Evaluate All Skills in Directory

python scripts/quick_eval.py ../output/skills --batch

Output as Markdown Report

python scripts/quick_eval.py <path> --format md

Verbose Mode (Show All Checks)

python scripts/quick_eval.py <path> --verbose

Integration with Skill Generation

When used after skill-creator, this skill validates quality before distribution:

User Request → skill-creator → [New SKILL.md] → skill-evaluator → [Quality Report]
                                                          ↓
                                               Fix issues if score < 0.70

Future: Tier 2 Deep Benchmark (Coming Soon)

Phase 2 will add optional deep testing:

  • Semantic search for matching benchmark tasks
  • Integration with OSWorld, SWE-Bench, AgentBench
  • LLM-as-a-Judge evaluation

Invoke with --deep flag when available.

Skills Info
Original Name:ontos-skill-evaluatorAuthor:ontos