Agent Skill
2/7/2026skill-enhancer
Use when you need to audit, fix, or improve an existing agent skill to meet Gold Standard compliance.
M
matrixfounder
2GitHub Stars
1Views
npx skills add MatrixFounder/Agentic-development
SKILL.md
| Name | skill-enhancer |
| Description | Use when you need to audit, fix, or improve an existing agent skill to meet Gold Standard compliance. |
name: skill-enhancer description: Use when you need to audit, fix, or improve an existing agent skill to meet Gold Standard compliance. tier: 2 version: 1.2
Skill Enhancer
Purpose: This meta-skill analyzes other skills for compliance with TDD, CSO, and Script-First standards, guiding the agent through upgrades.
1. Red Flags (Anti-Rationalization)
STOP and READ THIS if you are thinking:
- "I'll just add the sections blindly" -> WRONG. You must understand why the skill fails before fixing it.
- "The description is close enough" -> WRONG. It must start with "Use when".
- "Examples are optional" -> WRONG. "Rich Skills" mandate examples.
- "It's just a small 20-line example" -> WRONG. Inline blocks > 12 lines are prohibited. Extract them.
- "I'll instruct the agent to parse the file line-by-line in text" -> WRONG. Use "Script-First".
2. Capabilities
- Audit: Detect gaps (missing Red Flags, inline blocks > 12 lines, poor CSO, weak language) using
analyze_gaps.py. - Execution Policy Audit: Detect missing
Execution Mode,Script Contract,Safety Boundaries, andValidation Evidencesections. - Security Remediation: Fix vulnerabilities flagged by
skill-validator(e.g.,curl | bash, secrets, weak permissions). - Plan: Propose specific content improvements using
references/refactoring_patterns.md. - Execute: Apply refactoring patterns to upgrade the skill.
2.5. Execution Mode
- Mode:
hybrid - Rationale: gap triage and refactoring decisions are prompt-driven, while gap detection is script-driven.
2.6. Script Contract
- Primary Command:
python3 scripts/analyze_gaps.py <target-skill-path> [--json] - Inputs: target skill path + optional output mode.
- Outputs: structured gap list and pass/fail status.
- Failure Semantics: non-zero exit when gaps exist (for deterministic gate behavior).
2.7. Safety Boundaries
- Scope: apply edits only to explicitly selected target skill.
- Default Exclusions: do not refactor unrelated skills or global docs by default.
- Destructive Actions: full-file overwrite is prohibited unless explicitly requested and reviewed.
2.8. Validation Evidence
- Primary Evidence: before/after
analyze_gaps.pyoutput. - Secondary Evidence: targeted diffs proving that each reported gap was addressed.
- Quality Gate: no unresolved critical structure gaps after refactor.
3. Instructions
Phase 1: Audit
- Run Analyzer:
python3 scripts/analyze_gaps.py <target-skill-path>. - Manual Checks:
- Graduated Language Review: Check instruction language against the graduated approach:
- Safety-critical steps (data loss, destructive ops): Must use
MUST/ALWAYS+ explanation why — if the explanation is missing, add it - Behavioral steps (formatting, style): Apply explain-why + imperative style — if bare
MUSTwithout rationale, add the rationale; if weak "should"/"could", strengthen to imperative + reason - Do NOT blindly replace every "should" with "MUST" — evaluate whether the instruction is safety-critical or behavioral first
- Safety-critical steps (data loss, destructive ops): Must use
- Script-First Gap: Identify if complex logic steps (> 5 lines of text) MUST be converted to a
script/.
- Graduated Language Review: Check instruction language against the graduated approach:
- Review Gaps: Read the analyzer output and your manual findings.
Phase 1.5: Execution-Policy Audit
- Verify
Execution Modesection exists and is explicit (prompt-first,script-first, orhybrid). - If skill uses
scripts/, verifyScript Contractsection defines command, inputs, outputs, and exit behavior. - Verify
Safety Boundariessection defines scope limits and non-default destructive behavior. - Verify
Validation Evidencesection defines objective verification outputs. - Mark missing pieces as migration gaps (warning-first for legacy skills).
Phase 1.7: Behavioral Analysis (If Usage Logs Available)
If transcripts or logs from real skill usage exist, analyze them for patterns:
- Repeated Code: Did the agent write the same helper script across multiple runs? → Extract to
scripts/. - Repeated Questions: Did the agent ask the same questions or re-discover the same context? → Add to
references/. - Excessive Token Usage: Did the agent spend tokens reading large inline blocks? → Plan extraction to external files.
- Unused Sections: Did the agent skip reading certain sections entirely? → Consider trimming or consolidating.
If no usage logs are available, skip this phase — it will become relevant after the skill is deployed.
Phase 2: Plan
- Read Target Skill: Read the content of the target skill.
- Draft Improvements:
- Token Efficiency: Identify blocks > 12 lines and plan extraction to
examples/,assets/, orreferences/. - Script-First: Identify logic blocks > 5 lines and plan extraction to
scripts/. - Execution Policy: Add missing policy sections and scope constraints.
- Graduated Language: Replace weak words using the graduated approach —
MUST + whyfor safety,explain-why + imperativefor behavioral. - Red Flags: Identify 2-3 likely agent excuses for this specific task.
- CSO & Pushiness: Rewrite description to "Use when [TRIGGER]...". Check if description is "pushy" enough to prevent under-triggering — add edge-case triggers and phrases like "even if the user doesn't explicitly ask for…".
- Generalization: Check if instructions are overfitted to specific examples. A skill must work across many prompts, not just the test cases it was developed with.
- Token Efficiency: Identify blocks > 12 lines and plan extraction to
- Confirm: Ensure improvements align with the "Skills as Code" philosophy.
Phase 3: Execute
- Update File: Edit the target
SKILL.mdto insert the new sections.- CRITICAL: Use
replace_file_contentormulti_replace_file_content. - DO NOT use
write_to_fileto overwrite existing content (Data Loss Risk). - Tip: Use
references/refactoring_patterns.mdfor the style guide.
- CRITICAL: Use
- Verify: Re-run
analyze_gaps.py. Expect output "No Gaps Found".
Phase 3.5: Security Repair (If triggered by Validator)
- Analyze Report: Read the
skill-validatorJSON output. - Consult Guide: Use
references/security_refactoring.mdto find safe alternatives for flagged patterns. - Apply Fixes:
- Shell Injection: Replace direct execution with argument arrays.
- Downloads: Replace
curl | bashwith download -> inspect -> execute. - Secrets: Move hardcoded keys to environment variables.
Phase 4: Final VDD Check
- Read Checklist: Open
references/vdd_checklist.md. - Self-Correction: Verify your work against the 5 criteria (Data Safety, Anti-Laziness, etc.).
- Refine: If any check fails (e.g., found "TODO", found unmotivated "should"), fix it immediately.
- Test Coverage: Verify the skill has at least 2-3 test prompts — either in
evals/evals.jsonor documented inline. If none exist, create them based on the skill's intended use cases.
4. Best Practices
| DO THIS | DO NOT DO THIS |
|---|---|
| Specific Red Flags: "Don't skip tests" | Generic Red Flags: "Don't be lazy" |
| Trigger-Based Desc: "Use when debugging race conditions" | Summary Desc: "Guide for debugging" |
| Strong Verbs: "MUST", "EXECUTE", "VERIFY" | Weak Verbs: "should", "consider", "try" |
Rationalization Table
| Agent Excuse | Reality / Counter-Argument |
|---|---|
| "The skill is too simple for Red Flags" | Simple skills are skipped most often. Explicit rules prevent this. |
| "I don't have time to write examples" | Examples save time by preventing hallucinations later. |
| "It's easier to write logic in text" | Text logic is unreliable. Scripts are deterministic. |
5. Examples (Few-Shot)
[!TIP] See
examples/usage_example.mdfor a complete Before & After walkthrough of upgrading a legacy skill.
Input:
python3 scripts/analyze_gaps.py ../target-skill
Output:
⚠️ Gaps Detected...
Recommendation: Run 'Execute Improvement Plan'...
6. Resources
scripts/analyze_gaps.py: The gap detection tool.references/writing_skills_best_practices_anthropic.md: The authoritative "Gold Standard" guide used to verify compliance.references/testing-skills-with-subagents.md: Methodology for verifying fixes using TDD (Red-Green-Refactor).../skill-creator/agents/grader.md: Prompt for evaluating skill execution results against expectations.../skill-creator/agents/comparator.md: Prompt for blind A/B comparison of two skill outputs.../skill-creator/agents/analyzer.md: Prompt for post-hoc analysis — identifies why one skill version outperforms another.
Skills Info
Original Name:skill-enhancerAuthor:matrixfounder
Download