af-skill-fix-by-benchmarks
Run benchmarks to identify root causes of failures in skills and propose fixes with argumentation. Use when a skill is not performing as expected, failing tests, or when you need to verify improvements using the benchmarking system.
SKILL.md
| Name | af-skill-fix-by-benchmarks |
| Description | Run benchmarks to identify root causes of failures in skills and propose fixes with argumentation. Use when a skill is not performing as expected, failing tests, or when you need to verify improvements using the benchmarking system. |
flowai
An Assisted Engineering framework: the developer remains the architect and reviewer, while AI handles implementation under supervision.
The developer sets the task, approves the plan, and controls every diff.
The flowai project spans four sibling GitHub repositories:
- this repo (
korchasa/flowai) — the framework: skills, commands, agents, packs.korchasa/flowai-cli— the distribution CLI (flowaicommand). Bundles a SHA-256-pinned framework release tarball at publish time. Published to JSR as@korchasa/flowai.korchasa/flowai-workflow— universal DAG-based engine for orchestrating AI agents (YAML workflows, execution, validation, loops, resume). Published to JSR as@korchasa/flowai-workflow. Separate product, shares the flowai design philosophy.korchasa/flowai-experiments— parameterized empirical studies of AI agent platforms (e.g. maxCLAUDE.md/AGENTS.mdtoken budget at which an agent still follows an embedded rule). Informs framework design decisions.
The Assisted Engineering Paradigm
Assisted Engineering is a development model where the human retains full authority over architecture, design decisions, and final acceptance. AI acts as an executor — it writes code, runs checks, and proposes changes, but every meaningful action requires explicit developer approval.
Division of responsibility:
-
Developer (Architect + Reviewer)
- Defines goals and constraints
- Reviews and approves plans before implementation
- Inspects every diff before commit
- Makes architectural decisions
- Accepts or rejects results
-
AI Agent (Executor)
- Analyzes the codebase and gathers context
- Proposes implementation plans
- Writes code following approved plan
- Runs tests and verification
- Prepares atomic commits for review
Control flow:
Developer: sets task
→ AI: proposes plan
→ Developer: reviews plan, approves or adjusts
→ AI: implements step by step
→ Developer: reviews each diff
→ AI: commits approved changes
Installation
Requires Deno v2.x.
deno install -g -A jsr:@korchasa/flowai
# Recommended: install primitives once for all projects (user-level)
flowai sync --global
# Or install per-project (legacy, useful for team-wide or repo-tracked skills)
flowai
Global vs per-project
flowai / flowai sync select the install scope via three mutually exclusive flags:
--global/-g— force global install into IDE user-level dirs (~/.claude/,~/.cursor/,~/.config/opencode/,~/.codex/,~/.agents/skills/). Config at~/.flowai.yaml. One sync updates every project at once.--local/-l— force project-local install into<cwd>/.{ide}/. Config at<cwd>/.flowai.yaml. Use when you want team-wide skills tracked in the repo, or per-project overrides.--auto— default. Auto-resolves scope by probing config files:<cwd>/.flowai.yamlexists → project scope.- Otherwise
~/.flowai.yamlexists → global scope (CLI printsUsing global config at ~/.flowai.yaml). - Neither exists → CLI asks which scope to set up (defaults to global in
-y).
Opting a project into local install: create a <cwd>/.flowai.yaml (or run flowai --local to generate one). The mere presence of that file is the opt-in marker — subsequent runs without flags will use it.
Framework primitives MAY declare scope: project-only or scope: global-only in their SKILL.md frontmatter; the filter runs automatically. /update has no scope field because it is plugin/user-level installable and writes only current-project artifacts.
flowai migrate <from> <to> requires an explicit --global or --local flag — it never auto-resolves, since cross-IDE migrations have different semantics in each scope.
Claude Code + Codex plugin marketplace
In addition to the flowai CLI, Claude Code and Codex users can install any pack as a native plugin from the korchasa/flowai-plugins marketplace. All six marketplace packs (core, deno, devtools, engineering, memex, typescript) are published as separate plugins on every framework release:
# Inside a Claude Code session:
/plugin marketplace add korchasa/flowai-plugins
/plugin install flowai@flowai-plugins
# Optional, pick whichever stacks you use:
/plugin install flowai-deno@flowai-plugins
/plugin install flowai-typescript@flowai-plugins
/plugin install flowai-engineering@flowai-plugins
/plugin install flowai-devtools@flowai-plugins
/plugin install flowai-memex@flowai-plugins
/reload-plugins
# From a shell with Codex CLI installed:
codex plugin marketplace add korchasa/flowai-plugins
# Adding the marketplace registers all flowai-* plugins in `~/.codex/config.toml`
# with `enabled = true` automatically. Start a new Codex thread to load them; no
# interactive `/plugins` step is required. Edit individual `[plugins."<name>@flowai-plugins"]`
# tables in `~/.codex/config.toml` if you want to disable specific packs.
Skills are invoked under the plugin namespace: core uses /flowai:, while optional packs use /flowai-<pack>:, e.g. /flowai:commit, /flowai:plan, /flowai:update, /flowai-engineering:deep-research, /flowai-memex:save, /flowai-devtools:engineer-skill. Source primitive names are short kebab-case names; the plugin namespace carries the flowai brand. Cross-skill references inside skill bodies are rewritten to the namespaced form during build, and pack-level assets (e.g. AGENTS.template.md) ship inside each consuming skill — /flowai:update and /flowai:init work out of the box without a separate flowai sync step. Hooks declared by devtools and memex are translated to Claude Code's hooks.json format automatically.
Codex receives the same generated skills/ payload through .agents/plugins/marketplace.json and per-pack .codex-plugin/plugin.json. Codex hook execution is feature-gated; enable [features].plugin_hooks = true in Codex before relying on plugin hooks.
deno task check always rebuilds and validates the local plugin marketplace before running the rest of the project checks. By default it does NOT touch your installed plugins. To dogfood your local framework edits in Claude Code / Codex run deno task sync-plugins-local: it rebuilds ./dist/claude-plugins, re-points the flowai-plugins marketplace at that absolute path in each available CLI, and installs / updates every emitted pack at user scope. Re-pointing replaces the downstream-tracking source — return to korchasa/flowai-plugins by removing the local marketplace and adding the GitHub source again. Missing claude or codex CLIs are reported as warnings and skipped, not fatal. Set AUTO_INSTALL_PLUGINS=true in env or .env to opt deno task check into running the sync automatically after every successful build/validate prerequisite.
Local marketplace smoke:
# One-shot dogfood install of the local build into Claude Code + Codex at
# user scope (re-points the `flowai-plugins` marketplace at ./dist/claude-plugins):
deno task sync-plugins-local
# Or run the steps individually for inspection:
deno task build-plugins
# Claude Code, one-session smoke without installation:
claude --plugin-dir ./dist/claude-plugins/plugins/flowai
# Claude Code, persistent local user install:
claude plugin validate ./dist/claude-plugins
claude plugin marketplace add ./dist/claude-plugins
claude plugin install flowai@flowai-plugins --scope user
# Codex, local marketplace registration:
codex plugin marketplace add ./dist/claude-plugins
After codex plugin marketplace add, Codex 0.130.0+ auto-registers every plugin from the marketplace as [plugins."<name>@flowai-plugins"] enabled = true in ~/.codex/config.toml and a fresh Codex thread loads them. codex plugin exposes marketplace management only (marketplace add|upgrade|remove); there is no codex plugin install — refresh happens via codex plugin marketplace upgrade flowai-plugins. Disable individual packs by setting enabled = false (or removing the table) in ~/.codex/config.toml.
CLI and plugin install are mutually exclusive: if you install via the plugin marketplace, do NOT also run
flowai syncfor the same IDE in the same project — the CLI detects an installed flowai plugin and aborts to avoid dual installs. Pick one channel.
Security: plugins execute arbitrary code at your user privilege. Only install marketplaces and plugins from sources you trust. The
korchasa/flowai-pluginsrepository is a CI-generated mirror of this framework's packs and contains no human-authored content beyondREADME.mdandLICENSE. See FR-DIST.MARKETPLACE for the build / distribution contract.
Quick Start Prompt
Copy and paste the following prompt into your AI IDE (Claude Code, Cursor, OpenCode, OpenAI Codex) to install and initialize flowai in your project:
Install the flowai framework in this project:
- Check if Deno v2.x is installed (
deno --version). If not, ask the user which OS they are on and install Deno using the official method for their platform (macOS:brew install denoorcurl -fsSL https://deno.land/install.sh | sh, Windows:irm https://deno.land/install.ps1 | iex, Linux:curl -fsSL https://deno.land/install.sh | sh).- Run
deno install -g -A jsr:@korchasa/flowaito install the CLI (skip if already installed).- Run
flowaiin the project root to sync skills and agents into the IDE config directory.- Run
/initto analyze the codebase and generate AGENTS.md files, documentation scaffolding, and development commands.
Updating
Run /update (or plugin namespaced /flowai:update) in your AI IDE. It reconciles the current project with the installed framework templates:
- Reads framework templates from project-local assets, plugin-local assets, or user-level assets
- Compares them with project-owned artifacts (
AGENTS.md,CLAUDE.md, scaffolded docs/config) - Proposes per-file migrations with diffs and confirmation
- Leaves installed skills, agents, plugin caches, and user-level dirs untouched
To update the CLI binary or sync project-local primitives, use the standalone flowai CLI. To adapt project-local installed primitives, run /adapt.
How It Works
flowai is a set of Commands, Skills, and Agents — markdown instruction files that AI coding assistants (Cursor, Claude Code, OpenCode, OpenAI Codex, etc.) load into context to follow structured workflows.
- Commands (
framework/<pack>/commands/<name>/SKILL.md) — user-invoked workflows (e.g./commit). The agent does not auto-discover them. - Skills (
framework/<pack>/skills/<name>/SKILL.md) — agent-invocable capabilities. The agent picks them up automatically when relevant. - Agents (
framework/<pack>/agents/<name>/SUBAGENT.md) — role definitions with specialized capabilities. - Documentation (
documents/) — persistent project memory across sessions.
Both commands and skills install into .{ide}/skills/. The only IDE-visible difference is a disable-model-invocation: true flag on commands, added automatically by the CLI writer based on the source directory.
AI models lose context between sessions. flowai compensates by storing all decisions, requirements, and architecture in structured docs that the agent reads at the start of every session.
Product vs. Development Tooling
This repository contains two distinct layers. Do not confuse them:
framework/— the product itself. Skills and agents organized into packs that users install into their projects viaflowai. This is what flowai distributes..claude/skills/,.claude/agents/— internal development tooling. Skills and agents used to develop flowai itself (acceptance test runner, cursor-agent integration, code generation helpers). These are NOT distributed to users. Tracked in git directly.
Packs
The framework is organized into packs — modular groups of skills, agents, hooks, and scripts. Each pack has a pack.yaml with metadata. Users select which packs to install via .flowai.yaml.
core
Base commands for development workflows (commit, plan, review, init, etc.).
Commands:
init— project initialization (AGENTS.md, docs scaffolding, dev commands)commit— streamlined atomic commits (targeted doc sync, inline grouping, auto-invoked reflect)review-and-commit— streamlined review + commit (reuses diff across phases)push— safe git push (no--force, explicit upstream confirmation, post-push@{u}==HEADverification)ship— terminal full-cycle composite: plan → implement → review → commit → push (4 explicit gates)update— reconcile project AGENTS.md/CLAUDE.md/scaffolded artifacts with framework templatesadapt— adapt project-local skills/agents/hooks/assets to project specifics (standalone)
Skills:
implement— TDD implement skill (RED → GREEN → REFACTOR → CHECK over a written plan)plan— task planning (GODS format, gitignored task file)/plan-exp-permanent-tasks(command) — experimental committed-tasks variant; writes a persistent task atdocuments/tasks/<YYYY>/<MM>/<slug>.mdwith new-shape frontmatter (date,status: to do | in progress | done,implements,tags,related_tasks); status auto-derives from DoD by commit skillsepic— structured feature specification for multi-session featuresreview— QA + code review of current changesreflect— self-analysis of the current sessionreflect-by-history— cross-session analysis of past IDE transcriptsinvestigate— deep bug investigation via hypothesis-driven experimentsmaintenance— project health audit (16-category scan + interactive resolution)setup-ai-ide-devcontainer— AI IDE devcontainer setupconfigure-deno-commands— configure Deno tasks
Agents:
console-expert— complex console tasks and command executiondiff-specialist— git diff analysis and atomic commit preparationskill-adapter— adapts a single skill to project specifics after upstream updateagent-adapter— adapts a single agent to project specifics after upstream update
engineering
Procedural engineering knowledge (research, diagrams, writing, testing, etc.).
Skills:
deep-research— multi-source web research with sub-agentsdraw-mermaid-diagrams— Mermaid diagramsfix-tests— fix failing testswrite-prd— Product Requirements Documentswrite-dep— Development Enhancement Proposalswrite-gods-tasks— GODS-format taskswrite-in-informational-style— informational writing stylemanage-github-tickets— GitHub issue managementbrowser-automation— browser automationanalyze-context— token usage analysisengineer-prompts-for-instant— prompts for fast modelsengineer-prompts-for-reasoning— prompts for reasoning modelsinteractive-teaching-materials— interactive HTML teaching materials
Agents:
deep-research-worker— research worker for deep research sub-tasks
ide-bridge
Cross-IDE delegation: run a task in another AI IDE's CLI from the current session.
Skills:
ai-ide-runner— one-shot relay / fan-out comparison across Claude Code / OpenCode / Cursor / Codex CLIs; child's stdout relayed verbatimdelegate-to-ide— delegate a task to another IDE via an isolated-context subagent so the child's transcript stays out of the parent's context
Agents:
worker— single-shot cross-IDE CLI worker; spawned bydelegate-to-ide
devtools
Skill and agent authoring tools.
Skills:
engineer-skill— create/modify a skillengineer-command— create/modify a commandengineer-rule— create/modify a ruleengineer-hook— create/modify a hookengineer-subagent— create/modify a subagentwrite-agent-benchmarks— agent acceptance tests
deno
Deno-specific skills.
Skills:
cli— Deno CLI operationsdeploy— Deno Deploy management
typescript
TypeScript-specific setup skills.
Skills:
setup-agent-code-style-deno— Deno/TS code stylesetup-agent-code-style-strict— strict TypeScript
CLI Commands
The flowai CLI provides commands beyond interactive skill sync:
flowai sync
Sync framework skills/agents into project-local IDE config dirs. Primary command for installation and updates.
Supports installing from a git branch or local path via .flowai.yaml:
# Install from a branch (uses official repo by default)
source:
ref: feat/new-skill
# Install from a fork
source:
git: https://github.com/someone/flowai-fork.git
ref: main
# Install from local directory
source:
path: /path/to/flowai/framework
flowai sync only notifies when a newer CLI is published (Update available: X → Y. Run \flowai update` to install.). It never installs — the sole install entry point is flowai update. Suppress the check with --skip-update-check; preview a run without writes via -n/--dry-run`.
flowai update
Self-update the CLI binary. Checks JSR for a newer version; installs via deno install -g -A -f jsr:@korchasa/flowai@<version>. In -y (non-interactive) mode prints the update command instead of running it. Fail-open on network errors.
flowai update # interactive prompt
flowai update -y # print command only
flowai migrate <from> <to>
One-way migration of installed primitives (skills, agents, commands) from one IDE config dir to another — e.g. flowai migrate claude cursor. Use when switching primary IDE. --dry-run previews without writing; -y overwrites conflicts non-interactively.
flowai loop <prompt>
Run Claude Code non-interactively with real-time stream-json output. Base primitive for automation (CI, cron, scripts).
# Simple prompt
flowai loop "read deno.json and tell me the version"
# Invoke a skill via prompt
flowai loop "/analyze-context"
# With agent and auto-approve
flowai loop --yolo --agent console-expert "list all TODO comments"
# Repeated execution with pause
flowai loop --yolo --interval 5m --max-iterations 10 "/maintenance"
Options: --agent, --model, --cwd, --yolo, --timeout, --interval, --max-iterations. Run flowai loop --help for details.
Developer Workflow
1. Project Setup
Initialize the project structure and documentation:
- Run
initto analyze the codebase and generateAGENTS.md, SRS, SDS - Configure development commands for your stack
2. Task Cycle
Every task follows the same supervised loop:
- Task — describe what needs to be done
- Plan (
plan) — AI proposes a plan in GODS format. You review, adjust, approve - Execute — AI implements the approved plan. You watch the diffs
- Verify —
deno task check(or your project's equivalent) must pass. No exceptions - Review & Commit (
review-and-commit) — AI reviews changes, then prepares atomic commits. You review before push
3. Maintenance
maintenance— project health auditinvestigate— root cause analysis for complex bugs
Key Principles
- Developer controls, AI executes — no autonomous commits, no unsupervised architectural changes
- Explicit workflows — every task type has a defined skill with clear steps
- Persistent memory — documentation in
documents/bridges the gap between sessions - Single verification gate —
deno task checkis the source of truth for project health - IDE-agnostic — skills work across Cursor, Claude Code, OpenCode, OpenAI Codex, and other AI-assisted editors
Project Structure
framework/ # THE PRODUCT — distributed to users via the flowai CLI
core/ # Core workflow commands and agents
engineering/ # Procedural engineering knowledge
devtools/ # Skill/agent authoring tools
deno/ # Deno-specific skills
typescript/ # TypeScript-specific setup skills
documents/ # Project documentation (SRS, SDS, tasks)
scripts/ # Deno task scripts + acceptance test infrastructure
acceptance-tests/ # Acceptance test runs, config, lock, per-scenario result cache (scenarios in framework/<pack>/{commands,skills}/*/acceptance-tests/)
deno.json # Imports, tasks, lint/fmt config
AGENTS.md # Project vision, rules, agent instructions
.claude/ # INTERNAL — dev tooling + framework resources
skills/ # Dev-only skills (tracked) + framework skills (via flowai)
agents/ # Dev-only agents (tracked) + framework agents (via flowai)
Distribution flow
The CLI is no longer in this repo (see korchasa/flowai-cli). End-users still install the same JSR package (@korchasa/flowai); only the source-of-truth for CLI code moved.
korchasa/flowai (this repo) korchasa/flowai-cli
───────────────────────────── ─────────────────────────────
feat/fix/refactor on main framework.lock (pinned version)
│ │
▼ │
release job: │
• bump deno.json version │
• upload framework.tar.gz + │
framework.tar.gz.sha256 as │
assets of framework-v<X> ─────────────┐ │
│ GitHub │
│ release │
▼ ▼
scripts/bundle-framework.ts
(downloads tarball,
verifies SHA-256, untars,
bundles into src/bundled.json)
│
▼
tag v<Y> on flowai-cli
│
▼
JSR @korchasa/flowai
│
▼
deno install -g -A jsr:@korchasa/flowai
Documentation as Memory
Documentation is not optional — it is the only mechanism that preserves context between AI sessions.
AGENTS.md— project vision, constraints, mandatory rulesrequirements.md(SRS) — functional and non-functional requirementsdesign.md(SDS) — architecture, components, data modelstasks/— task plans per session (GODS: Goal, Overview, Done, Solution)
The agent reads these at session start. If the docs are outdated, the agent works with wrong assumptions. Keep them accurate.
Development Setup
For contributors working on the framework (skills, commands, agents, packs):
Prerequisites: Deno, Git
git clone https://github.com/korchasa/flowai.git
cd flowai
deno task check
Dev-only skills and agents live in .claude/skills/ and .claude/agents/ (tracked in git). Framework skills/agents are installed by flowai from bundled source.
Composite SKILL.md files are gitignored build artefacts. Source of truth is framework/composites.yaml (manifest) + framework/atoms/*.md (parametrized step bodies) + framework/composites/*.md (wrappers). Every consumer (deno task check, deno task acceptance-tests, deno task build-plugins, CI tarball build) regenerates SKILL.md from source via --write before reading — so the rendered output is always current and there is no tracked rendered copy that can drift. The 8 generated paths are listed in .gitignore; the generator's checkGitignoreParity fails the build if that list goes out of sync with --list-targets. Generator inputs are excluded from framework.tar.gz via tar --exclude in .github/workflows/ci.yml, and re-verified by scripts/check-pack-refs.ts --leakage. See framework/AGENTS.md § Composite Skill Authoring for the canon rules.
For contributors working on the CLI itself (sync engine, IDE adapters, bundle pipeline) — go to korchasa/flowai-cli. That repo has its own deno task check, its own test suite, and publishes @korchasa/flowai to JSR on tag v*. It pins a framework revision via framework.lock; bump it with deno task bump-framework <version> after a new framework-v* release lands here.
License
MIT