Agent Skill
2/7/2026skill-evals-optimize
Optimize OpenCode skill-loading eval failures. Use when triaging failed eval cases, applying limited fixes, and re-running targeted evals.
C
chandima
1GitHub Stars
1Views
npx skills add chandima/opencode-config
SKILL.md
| Name | skill-evals-optimize |
| Description | Optimize OpenCode skill-loading eval failures. Use when triaging failed eval cases, applying limited fixes, and re-running targeted evals. |
name: skill-evals-optimize description: "Optimize OpenCode skill-loading eval failures. Use when triaging failed eval cases, applying limited fixes, and re-running targeted evals." allowed-tools: Bash(./evals/skill-loading/opencode_skill_eval_runner.sh) Bash(./scripts/) Bash(python:) Read Glob Grep context: fork
Skill Evals Optimize
Triage failed eval cases using the steering guide, apply limited fixes, and retest with a strict iteration cap.
Inputs
- Results root:
evals/skill-loading/.tmp/opencode-eval-results - Max optimization iterations: 2
- Steering guide:
evals/skill-loading/docs/skill-optimization-steering.md
Workflow
-
Locate latest results and list failed cases
bash scripts/list-fails.sh -
For each failed case
- Read the case entry in
opencode_skill_loading_eval_dataset.jsonl. - Consult the steering guide for the appropriate fix strategy.
- Propose the smallest targeted change (skill description, prompt, permissions, or tests).
- Read the case entry in
-
Retest only the failed cases
bash scripts/retest-fails.sh --parallel 3 -
Enforce the iteration cap (2 max)
- After two fix+retest cycles, stop optimizing.
- Acknowledge remaining failures as legitimate model limitations or out-of-scope behaviors.
Helper Scripts
bash scripts/list-fails.shlists FAIL case IDs from the latest run.bash scripts/retest-fails.shre-runs only failing cases.- Use
--filter-idto scope (e.g.,--filter-id "gh_|c7_"). - Use
--dry-runto print the command without executing.
- Use
Rules
- Do not broaden permissions or weaken tests to force a pass.
- Prefer minimal, reversible changes.
- If a failure persists after two iterations, label it as legitimate and move on.
Output
- Summarize PASS/FAIL counts.
- List failed case IDs and which ones were accepted as legitimate failures.
- Reference the steering guide for any remaining follow-up.
Skills Info
Original Name:skill-evals-optimizeAuthor:chandima
Download