name: skill-evals-optimize description: "Optimize OpenCode skill-loading eval failures. Use when triaging failed eval cases, applying limited fixes, and re-running targeted evals." allowed-tools: Bash(./evals/skill-loading/opencode_skill_eval_runner.sh) Bash(./scripts/) Bash(python:) Read Glob Grep context: fork

Skill Evals Optimize

Triage failed eval cases using the steering guide, apply limited fixes, and retest with a strict iteration cap.

Locate latest results and list failed cases
```
bash scripts/list-fails.sh
```
For each failed case
- Read the case entry in opencode_skill_loading_eval_dataset.jsonl.
- Consult the steering guide for the appropriate fix strategy.
- Propose the smallest targeted change (skill description, prompt, permissions, or tests).

Retest only the failed cases

bash scripts/retest-fails.sh --parallel 3

After two fix+retest cycles, stop optimizing.
Acknowledge remaining failures as legitimate model limitations or out-of-scope behaviors.

bash scripts/list-fails.sh lists FAIL case IDs from the latest run.
bash scripts/retest-fails.sh re-runs only failing cases.
- Use --filter-id to scope (e.g., --filter-id "gh_|c7_").
- Use --dry-run to print the command without executing.

Name	skill-evals-optimize
Description	Optimize OpenCode skill-loading eval failures. Use when triaging failed eval cases, applying limited fixes, and re-running targeted evals.