name: skill-test description: | Interactive skill testing and ground truth generation. Use when testing a skill's code generation capabilities by executing generated code and saving verified working examples. Invoke with: /skill-test <skill-name>

/skill-test - Interactive Skill Testing

Test skills by executing generated code and building verified ground truth datasets.

/skill-test <skill-name>

Read the target skill's SKILL.md and key references:

.claude/skills/<skill-name>/SKILL.md
.claude/skills/<skill-name>/references/  (if exists)

Ask the user for a test prompt. Example prompts for mlflow-evaluation:

Respond using the loaded skill's knowledge. Include complete, runnable Python code.

Extract Python code blocks from the response. For each block:

Write to temp file:

cat > /tmp/skill_test_block_N.py << 'EOF'
<code>
EOF

Execute with timeout:

timeout 30 python /tmp/skill_test_block_N.py

Report result:
- Success: "Block N: Executed successfully"
- Failure: "Block N: Failed - <error>"

If ALL code blocks execute successfully, ask:

"All code executed successfully. Save as ground truth? [Y/n]"

If yes, append to benchmarks/skills/<skill-name>/ground_truth.yaml. See ground-truth-format.md for the YAML schema.

If any code block fails, show the error and ask:

"Code execution failed. [R]etry with fix, [S]kip, or [A]bort?"

Name	skill-test
Description	Interactive skill testing and ground truth generation. Use when testing a skill's code generation capabilities by executing generated code and saving verified working examples. Invoke with: /skill-test <skill-name>