skill-creator
Create, revise, and quality-gate Codex skills (SKILL.md + resources + evals + packaging) when asked to build or improve a skill.
SKILL.md
| Name | skill-creator |
| Description | Create, revise, and quality-gate Codex skills (SKILL.md + resources + evals + packaging) when asked to build or improve a skill. |
name: skill-creator description: "Create, revise, benchmark, and quality-gate Codex skills (SKILL.md plus scripts, references, evals, and packaging). Use this skill whenever the user asks to build, audit, improve, compare, or package a Codex skill or skill-graph contract; do not use it for unrelated feature coding."
Skill Creator
This skill designs, improves, validates, and packages high-quality Codex skills. Version: 1.9.1 · Last updated: 2026-03-06
Table of Contents
- Working agreement
- Scope and triggers
- Modes
- Needed inputs
- Discovery interview
- Deliverables
- Response format
- Operating principles
- Skill creation process
- Script-backed security rules
- What to avoid
- Constraints
- Validation
- Examples
- Reference map
- Decision feedback protocol
- Empowering execution style
Working agreement
- Follow the repo
AGENTS.md; treat it as a map, not a megadoc. - Keep the artifact boundary explicit:
- local Codex CLI ->
./artifacts/ - hosted shell ->
/mnt/data/
- local Codex CLI ->
- For long runs, write short status notes so compaction is safe.
- For advisory or intake-only turns, do bounded preflight first; do not broad-scan the repo tree unless editing or repo-wide analysis is actually needed.
Scope and triggers
Use this skill to:
- create a new skill;
- improve an existing skill’s routing, workflow, or reliability;
- audit a skill against repo standards and validators;
- compare two skill variants;
- package a validated skill for reuse;
- improve skill-graph contracts tied to recursive workflow operations.
Do not use this skill for unrelated application features, generic debugging, or ordinary documentation tasks that do not create or improve a skill.
Modes
Choose the smallest mode that fits:
- create: scaffold and author a new skill;
- improve: tighten routing, workflow, safety, or portability;
- eval: run validators and summarize findings;
- benchmark-lite: compare variants with shared evals;
- graph: refine skill-graph docs, contracts, or runtime metadata;
- package: produce a distributable
.skillarchive.
Default to create or improve.
Needed inputs
- skill goal;
- 3-10 example prompts:
- 2-5 happy path,
- 1-3 edge cases,
- 1-3 should-not-trigger negatives;
- target environment:
codex,claude, orportable; - any needed tools, APIs, schemas, templates, or style rules;
- compatibility posture;
- if graph work is in scope, the profile/scope and intended graph outputs.
If key details are missing, ask only the minimum questions needed to proceed safely.
Discovery interview
When create or improve work is underspecified, run a round-based discovery interview before building.
- Use Codex
request_user_inputfirst when a round fits into 1-3 short prompts. - Ask one round at a time and wait for the answer before moving on.
- Skip rounds already answered by the thread; do not make the user restate known context.
- Stop when you are about 95 percent confident you can build the skill safely.
- Keep each round intuitive:
- start with one plain-language question,
- prefer concrete options over jargon,
- explain why the round matters in one short sentence,
- avoid dumping the whole interview plan at once.
- Prefer the reusable
request_user_inputmini-templates and payload examples inreferences/discovery-interview.mdunless the thread already suggests a better phrasing. - Before building, summarize the skill back to the user and ask for confirmation.
- Use
references/discovery-interview.mdfor the six-round default flow.
Deliverables
Produce what the request actually needs, usually from this set:
- a skill folder with
SKILL.md; scripts/,references/,assets/, orworkflows/when they materially help;agents/openai.yamlwhen OpenAI or Codex UI or tool metadata is needed;references/contract.yamlandreferences/evals.yamlfor non-trivial skills;references/plan.mdfor multi-step builds;- validator output, analyzer output, and an OpenClaw-style readiness summary;
- a packaged
.skillfile when packaging is requested.
For created or revised skills, keep decision-feedback instrumentation in scope:
- include the
decision-feedback-protocol:v2block in the skill when post-run feedback is relevant; - make feedback runtime-owned: if capture is enabled, emit a non-blocking
post_run_feedbackevent after result delivery via Codexrequest_user_input; - keep feedback recordable with
scripts/record_skill_feedback.py.
Response format
Start with these headings and no text before them:
Scope and triggers
- confirm when the skill applies.
Required inputs
- list missing inputs or state what is already known.
Deliverables
- list the files, checks, or artifacts you will produce.
Failure mode
- if the request is out of scope, say why and suggest the closest next step.
On the first response, stay compact: no deep implementation, no file scaffolding, and no validator dump unless the user asked for it or the inputs are already complete.
Operating principles
Humans steer. Agents execute.
Translate vague intent into a repeatable workflow. When results are weak, fix the scaffolding, constraints, or feedback loop instead of pushing harder on the same draft.
Keep SKILL.md as a map
Keep route-critical guidance in SKILL.md and move depth into:
references/for the system of record,scripts/for deterministic helpers,assets/for templates and fixtures.
Descriptions are routing logic
The frontmatter description is the decision boundary. Make it concrete about:
- when to use the skill;
- when not to use it;
- outputs and success criteria.
Use references/description-optimization.md when routing quality is the main issue.
Calibrate language to the user
Use terms like eval, assertion, benchmark, and JSON only as freely as the user’s fluency supports. Briefly define specialized terms when needed.
Think in tradeoffs
Ask three questions before you lock the design:
- Why does this skill exist?
- What evidence would change the workflow?
- What should stay out of scope?
Vary examples, adapt the structure, and choose context-specific guidance. Avoid repetitive, generic, cookie-cutter instructions; converge only after evidence review.
Skill creation process
Skip steps only with a clear reason.
0) Confirm target and boundary
- confirm where the skill lives;
- confirm the artifact boundary;
- mine the current thread for demonstrated tools, steps, constraints, output shapes, and corrections before asking the user to restate them.
1) Lock down triggers early
- collect happy, edge, and negative prompts;
- encode use-when, don’t-use-when, outputs, and success criteria in the
description; - keep compatibility boundaries explicit;
- for non-trivial skills, write
references/evals.yamlearly; - for trigger-tuning work, use
references/description-optimization.md.
2) Choose the structure
- single-file for one intent and one workflow;
- router-style for multiple intents, heavy domain knowledge, or multiple output contracts.
Router layout:
skill-name/
SKILL.md
workflows/
references/
scripts/
assets/
agents/openai.yaml
3) Scaffold the folder
Use:
python scripts/init_skill.py <skill-name> --target codex --run-type instruction --path <output-dir>
Delete unused starter files immediately.
4) Author SKILL.md
- keep frontmatter minimal:
nameplusdescription; - make
descriptiona single line with trigger boundary and success criteria; - keep the workflow lean and reliable;
- link to references instead of pasting deep docs;
- keep templates and examples inside the skill bundle, not in prompts.
5) Add resources only when they earn their keep
- use
references/for deep docs, schemas, contracts, and evals; - use
scripts/for repeatable helpers; - use
assets/for templates and fixtures.
If repeated eval runs recreate the same helper script, bundle it once in scripts/ and reuse it.
6) Validate
Run the core validators, fix the first failing gate, then rerun.
6.5) Compare variants only when needed
- run shared evals for baseline and candidate;
- compare evidence and failure modes, not style alone;
- when quality is subjective, prefer blind comparison before declaring a winner.
7) Package only after quality gates pass
python scripts/package_skill.py <path/to/skill-folder> dist/
Script-backed security rules
When the skill includes executable code:
- default to offline behavior;
- gate network use behind explicit flags and an allowed domains or host allowlist policy;
- keep destructive actions behind dry-run or explicit confirmation;
- never echo secrets or raw environment values.
What to avoid
- NEVER pad
SKILL.mdwith theory that belongs inreferences/. - DO NOT broaden the
descriptionuntil it overlaps with adjacent skills. - DON'T add scripts, network calls, or compatibility branches unless the workflow truly needs them.
- Avoid marketing copy; treat the description as routing logic.
- Avoid absolute paths and hidden environment assumptions when a relative path or explicit prerequisite will do.
Constraints
- Redact secrets, credentials, tokens, and PII by default.
- Keep frontmatter valid and minimal.
- Do not invent external facts; add a verification step when uncertain.
- Default to canonical implementations for unreleased or greenfield work.
- For schema-bound outputs, include
schema_versionin the contract or artifact example. - Use
docs/skill-graphs/question-lifecycle.mdfor question timing anddocs/skill-graphs/knowledge-graph-operating-model.mdfor graph-mode runtime boundaries instead of expanding them inline here.
Validation
Fail fast: stop at the first failed gate, fix it, and rerun before continuing. Core validators:
~/.venvs/pyyaml/bin/python scripts/quick_validate.py <path/to/skill-folder>
~/.venvs/pyyaml/bin/python scripts/skill_gate.py <path/to/skill-folder>
~/.venvs/pyyaml/bin/python scripts/analyze_skill.py <path/to/skill-folder>
~/.venvs/pyyaml/bin/python scripts/openclaw_skill_guard.py <path/to/skill-folder> --mode both
For new skills:
~/.venvs/pyyaml/bin/python scripts/run_skill_evals.py <path/to/skill-folder>
Optional deep checks:
~/.venvs/pyyaml/bin/python scripts/run_skill_evals.py <path/to/skill-folder> --dual-run --capture-jsonlpython3 scripts/record_skill_feedback.py --skill-path <path/to/skill-folder>/SKILL.md --decision accepted --outcome good --confidence high --notes "validation sample" --workspace <workspace>python3 scripts/skill_subject_scoreboard.py --workspace <workspace> --format table
Examples
- “Create a new skill called
foo-barunderutilities/with evals and an output contract.” - “Audit this skill’s trigger quality and tighten the description.”
- “Compare two variants of this skill and keep the better one.”
Reference map
Use these files when needed:
references/discovery-interview.mdfor round-by-round clarification;references/description-optimization.mdfor trigger tuning;references/iteration-and-testing.mdfor eval-driven refinement;references/quality-tools.mdfor validator and eval interpretation;references/skill-structure.mdandreferences/progressive-disclosure-patterns.mdfor layout choices;references/security-hardening.mdfor offline defaults and destructive-action gates;docs/skill-graphs/question-lifecycle.mdfor question timing and post-run feedback behavior;docs/skill-graphs/knowledge-graph-operating-model.mdfor graph-mode boundaries;references/examples.md,references/anti-patterns.md, andreferences/governance-and-style.mdfor calibrated examples and deeper guidance.
Decision feedback protocol
<!-- decision-feedback-protocol:v2 -->- Question timing is runtime-owned. Do not make the skill itself decide when feedback is asked.
- If post-run feedback capture is enabled, emit a non-blocking
post_run_feedbackevent via Codexrequest_user_inputafter result delivery. - Capture
decision,outcome, andconfidence. - Persist feedback with
python3 scripts/record_skill_feedback.py.
Empowering execution style
Be capable, creative, and willing to explore better options when evidence supports them. Enable the user to make safe choices by explaining tradeoffs clearly, then keep the implementation disciplined and auditable.