name: auto description: Autonomous task execution with testing and security. Works through all tasks without stopping. triggers:

auto allowed-tools: Bash, Read, Write, Edit, Grep, Glob, Task, TaskCreate, TaskUpdate, TaskList, Agent, SendMessage model: opus user-invocable: true

Auto Mode

Fully autonomous development. Works through all tasks without stopping until complete.

Current State

!git status --short !node -e "try{const p=require('./prd.json');const sp=p.sprints?p.sprints[p.sprints.length-1]:p;const s=Object.values(sp.stories||p.stories||{});const name=sp.id||sp.name||p.sprint||'unknown';const done=s.filter(x=>x.passes===true).length;const pend=s.filter(x=>x.passes===null||x.passes===false).length;console.log('Sprint:',name,'| Done:',done,'| Pending:',pend,'| Total:',s.length)}catch(e){console.log('No prd.json')}"

Entry Flow

auto
  |-- Activate: write .claude/auto-active
  |-- Check prd.json exists?
  |   |-- No -> Bootstrap from context
  |   +-- Yes -> Check pending tasks
  |               |-- None pending -> IDLE Detection
  |               +-- Has pending -> Execute tasks
  |
  +-- Execute until done or interrupted
  +-- Deactivate: delete .claude/auto-active

Auto-Active Flag (Continuous Execution)

On start, create the flag file using the Write tool (not Bash echo — avoids sensitive file permission prompt):

Write tool → .claude/auto-active
Content: {"started":"<current ISO timestamp>","sprint":"<current sprint>"}

This flag tells the Stop hook to block Claude from stopping. Claude keeps working as long as this flag exists.

On exit (user says "done", or nothing left), do not rm the flag — Bash ops on .claude/ trigger a sensitive-file permission prompt even under bypass. Instead, simply stop working. The Stop hook owns the flag lifecycle:

Stale flags (>2h old) are auto-cleaned
Sprint-complete → hook runs IDLE detection once, then approves stop and removes the flag on the next attempt
No prd.json → hook approves stop and removes the flag immediately

If the user explicitly says "deactivate auto" mid-sprint, use the Write tool to create .claude/auto-exit (empty file). The Stop hook treats that as an unconditional exit signal and cleans up both files.

Autonomous Behavior

Do not ask "Should I continue?" or show summaries and wait.

Instead:

Make autonomous decisions
Keep working until truly done
The Stop hook prevents Claude from ending — trust it

Persist to prd.json

When findings, scan results, or ad-hoc issues are identified during execution, write them to prd.json as stories before fixing them. prd.json is the source of truth that survives session restarts and /compact.

Lightweight Mode

If the user gives a direct instruction (e.g., "fix this button", "update that copy") rather than saying "auto":

Skip prd.json and sprint creation entirely
Just fix, verify, done
Use prd.json only when there are 5+ tasks to track

Bootstrap (No prd.json)

When prd.json does not exist:

Read CLAUDE.md, README.md, package.json for context
Generate 5-10 starter tasks based on project
Create prd.json with stories
Continue immediately — do not stop for approval

Pre-flight (Smart)

Before first task, run these checks. Use simple commands that won't trigger security filters:

# 1. Git status
git status --short

# 2. Dependencies fresh?
# Compare timestamps — if package.json is newer than node_modules, run npm install
ls -lt package.json node_modules/.package-lock.json 2>/dev/null | head -1

If package.json is newer or node_modules is missing, run npm install.

# 3. Detect test runner — read package.json with Read tool, check for vitest/jest/playwright in devDependencies
# Use the detected runner for all test steps in this session

# 4. Build check
npm run build 2>&1 | tail -5

# 5. Branch check
git branch --show-current

If on main/master, create a feature branch before making changes.

# 6. Worktree cleanup
git worktree prune 2>/dev/null

Skip individual checks if they take >10 seconds. Use Read tool to inspect package.json instead of node -e one-liners.

Task Execution

Find Next Task

// prd.json has two shapes:
// Flat:   { stories: { "S1-001": {...} }, sprint: "sprint-1" }
// Nested: { sprints: [{ id: "sprint-1", stories: { "S1-001": {...} } }] }
const sp = prd.sprints ? prd.sprints[prd.sprints.length - 1] : prd;
const stories = sp.stories || prd.stories || {};
const storyEntries = Object.entries(stories);
const executable = storyEntries.filter(([id, s]) =>
  s.passes !== true &&
  (s.blockedBy || []).every(dep => stories[dep]?.passes === true)
);

Size-Gate Before Executing

Before starting a task, assess its scope:

Small (1-3 files, clear fix) → execute directly
Medium (3-5 files, clear approach) → execute with extra caution
Large (5+ files, new feature, multiple integrations) → write a 3-sentence inline plan before coding:
1. What changes
2. What systems are affected
3. What to verify after
Then execute. Do not stop to ask — the inline plan is sufficient for auto mode.

Execute Each Task

Progress output: [3/8] Starting: S6-003 — Add loading states
Read the task description
Context Loading — read 2-3 similar files to match existing patterns
Apply Generation Constraints (see below) — before writing code
Implement the solution
Self-Critique — re-read your diff before running checks (see below)
npm run typecheck — fix if fails
npm run build — fix if fails
Self-Verification (see below)
Visual verification — if the task touched UI, run agent-browser or Playwright screenshots. Do not skip this.
Test generation — if the task created an API route, auth logic, or data mutation, write at least one test (see below)
Progress output: [3/8] ✓ S6-003 | Next: S6-004
Update prd.json: passes: true
Start next task immediately

Generation Constraints (apply before writing code)

Before writing code, load references/generation-constraints.md — it covers TypeScript strictness patterns, security/data-safety rules (fetch error handling, SSRF guards, env var enforcement), accessibility checklist (labels, focus rings, touch targets), design anti-slop rules, and the test-generation matrix for API/auth/data mutations. Each rule explains the failure mode it prevents, so you can judge when to bend it.

After writing code but before typecheck, re-read your diff against the 8-point self-critique checklist in the same reference file.

Acceptance Criteria & Verify Tags

When creating stories in prd.json, each story can carry a verify: [] array and an acceptance: [] array. Load references/verify-tags.md for the tag definitions (visual, a11y, design, security, auth, test, api) and an example story.

If no verify field exists, auto infers from the task type (UI → visual+a11y+design, API → api+security, etc.).

Before marking a task done, verify each acceptance criterion. "Does it compile?" is not acceptance — "does it behave correctly?" is.

Context Loading (before writing any code)

Read 2-3 existing files most similar to what you're building
Identify patterns: naming conventions, import style, error handling, state management
Match patterns — do not introduce new patterns when existing ones cover the use case
For UI tasks: Read globals.css (or tailwind config) + layout.tsx to understand the project's design system — fonts, colors, component patterns. Use the project's ACTUAL tokens, not stock defaults. Check if fonts are loaded via next/font or just declared in CSS.
For library-specific code (third-party APIs, SDKs, framework features): If Context7 tools are available (mcp__plugin_context7_context7__*), query them for version-pinned docs before writing code that touches the library. This prevents using deprecated APIs or patterns from old training data — even for well-known libraries like Next.js, React, and Supabase.
Check for Doppler: If doppler.yaml exists in the repo, secrets live in Doppler — prepend doppler run -- when running npm/pnpm/bun run dev|build|start|test commands. If doppler CLI is missing, install first (see doppler skill). If not logged in, stop and guide user to doppler login.

Design context is not optional. Auto-generated UI without reading the design system produces stock shadcn that fails the AI slop checklist. Spend 30 seconds reading the design tokens before writing any component.

Verification

Task Type	Verification
UX/UI (public pages)	agent-browser screenshots (desktop + mobile) + console errors
UX/UI (admin/internal)	typecheck + build only
Feature (UI)	Build passes + visual check if public UI changed + complete primary user flow once
Edge Function / API	Deploy + `curl` with real params + verify 200 + response shape matches expected
API Integration	Real request with real credentials + verify response contains expected data
Bug fix	Reproduce, verify fixed, no new errors
Refactor	Typecheck + build + existing tests pass + no behavior change
Auth / billing / RLS	Write or verify a test for the security-critical path

All task types also require the Hardening Check (step 4c). This catches logic bugs that typecheck and build miss.

Integration test is mandatory for API/Edge Function tasks. Typecheck alone does not catch wrong API keys, wrong function signatures, or wrong database tables. Make one real request before marking done.

Ignore Preview-plugin visual reminders on non-UI edits. Claude Code's Preview plugin and similar tools may suggest "verify in browser" on any file change. Skip the suggestion when the edit was purely server-only: API routes, middleware, next.config.*, *.test.*, migration files, types-only files, server actions without JSX. Only run visual verification when the edit touches a React/Vue/Svelte component, page, layout, or CSS that ships to the browser.

Risk-shaped testing. When adding tests, prioritize paths that handle money, access control, or user data over easy-to-test pure functions.

For UI/API tasks, detect or start a dev server first:

# Check if already running
for port in 3000 3001 5173 8080; do curl -s http://localhost:$port > /dev/null 2>&1 && break; done
# If none found, start one in background
Bash({ command: "npm run dev", run_in_background: true })
# Wait for startup, then verify

Use agent-browser (preferred — token efficient) or Playwright (more capabilities):

agent-browser (preferred):

agent-browser open http://localhost:3000/[page]
agent-browser snapshot -i          # Desktop screenshot + DOM
agent-browser viewport 375 812     # Switch to mobile
agent-browser snapshot -i          # Mobile screenshot
agent-browser errors               # Console errors

Playwright fallback (if agent-browser unavailable):

npx playwright screenshot http://localhost:3000/[page] .claude/screenshots/page-desktop.png
npx playwright screenshot --viewport-size=375,812 http://localhost:3000/[page] .claude/screenshots/page-mobile.png

If neither is available, fall back to WebFetch or curl for basic page load verification.

Analyze screenshots for: broken layout, missing content, visual regressions, design quality, dark mode correctness. Fix console errors or visual issues before marking task complete.

Self-Verification (after each task)

Before marking any task as complete:

1. Type Safety

npm run typecheck 2>/dev/null || npx tsc --noEmit 2>/dev/null

2. Tests

npm test -- --passWithNoTests --watchAll=false 2>/dev/null

3. Resource Validation If the task added external resources (images, fonts, API URLs), validate them:

# Check image/asset URLs are reachable
grep -rn 'https://.*\.(png|jpg|svg|webp|woff2)' src/ --include="*.tsx" --include="*.ts" | while read line; do
  url=$(echo "$line" | grep -oP 'https://[^\s"'\'']+'); curl -s -o /dev/null -w "%{http_code} $url\n" "$url"
done

Fix broken URLs before committing — they cause blank images and layout shifts in production.

4. Self-Review Run git diff and check: no console.log/debugger, no hardcoded colors, all UI states handled, no any types, no commented-out code.

4b. Sweeping Change Verification If the task involved a bulk find-and-replace (e.g., renaming, migrating values, swapping imports), grep for the OLD pattern to confirm it's fully eliminated. Partial migrations cause subtle bugs (e.g., USD→EUR migration that missed one pricing page).

4c. Hardening Check (per-task audit-lite) Review the diff for these patterns in the files you just changed. Fix before marking done:

Pattern	What to Check	Fix
Fail-open auth	`if (secret && ...)` skips auth when env var is unset	Fail-closed: return 401 if env var missing
Unsafe casts	`as unknown as`, `as any`, double assertions	Create a validator (Zod or manual), parse instead of cast
Fire-and-forget fetch	`fetch()` without try/catch or `.ok` check	Wrap in try/catch, check `res.ok`, revert optimistic state on failure
Missing form labels	`<input placeholder="...">` without `<label>` or `aria-label`	Add `<label>` or `aria-label` to every input
Missing autocomplete	Login/signup inputs without `autoComplete`	Add `autoComplete="email"`, `autoComplete="current-password"`, etc.
User-supplied URLs	Server-side `fetch(userUrl)` without validation	Validate URL, resolve DNS, block private IP ranges
Env var fallbacks	`process.env.X \|\| 'localhost'` or `\|\| ''`	Throw if missing in production, only fallback in dev
RLS policy logic	New table or RLS change	Verify policy restricts to `auth.uid()` for user data
Missing focus styles	Raw `<button>` without `focus-visible:ring-*`	Add `focus-visible:ring-2 focus-visible:ring-ring`
Stock UI	Fonts declared but not loaded, text-only nav, generic empty states	Load fonts via next/font, add icons, add visual personality
Dark mode	Colors that don't use theme tokens, cards same color as background	Use semantic tokens, add elevation distinction
Chart colors	`hsl(var(--x))` when var already contains `hsl(...)`	Use raw HSL values or remove outer `hsl()` wrapper

Only check patterns relevant to the files you changed — this is a 30-second scan of your own diff, not a full audit.

5. Design Token Compliance (UI tasks only) If the task changed .tsx or .css files, verify the output uses the project's actual tokens:

# Check for stock shadcn / hardcoded colors in changed files
git diff --name-only | xargs grep -n "text-white\|bg-black\|text-gray-\|bg-gray-\|#[0-9a-fA-F]\{6\}" 2>/dev/null | grep -v "gradient\|from-\|to-\|via-" | head -10
# Check fonts are loaded, not just declared
grep -rn "fontFamily\|font-family" src/ --include="*.css" --include="*.tsx" | grep -v "next/font\|@font-face\|tailwind" | head -5

If stock colors or unloaded fonts found in YOUR changes, fix before proceeding.

6. UI/API Change? Visual Verification Run agent-browser or Playwright screenshots from the Verification section above. Not optional for UI tasks.

7. Mark Complete Only after all checks pass. UI files (.tsx, .css, layout, page) without visual verification → go back to step 6.

Smart Retry

On failure:

Auto-fix first — Most failures are trivial (missing import, type mismatch, wrong path). Read the error, fix it inline, re-run the check. This does not count as a retry.
Retry 1: Different approach
Retry 2: Simplest possible implementation
Still fails: set passes: false, continue to next task
If failure is due to missing external setup (API keys, services, infrastructure): set passes: "needs-setup" with blockedReason explaining what's needed. This distinguishes "can't do yet" from "tried and failed."

Do not retry a third time. Do not spend more than 10 minutes on retries for a single task.

Error Pattern Recognition

Track error types across tasks. When the same error pattern appears 3+ times:

Save it to auto-memory as a known pattern with its fix recipe
On future occurrences, apply the fix immediately without the auto-fix→retry cycle

Common patterns to recognize:

Error Pattern	Instant Fix
`exactOptionalPropertyTypes` error	Add `\| undefined` to optional prop types: `foo?: string \| undefined`
`Cannot find module './X'`	Check file exists, fix path or create file
`Type 'X' is not assignable to type 'Y'`	Check the type definition, add union or cast
`Property 'X' does not exist on type 'Y'`	Add to interface or use optional chaining
`RLS policy violation`	Check auth.uid() in policy, verify user is authenticated
`CORS error`	Check API route headers or middleware config
`as unknown as` cast	Create a validator function, parse instead of assert
Unhandled fetch in component	Wrap in try/catch, check res.ok, add error feedback
`<input>` without label	Add `<label htmlFor>` or `aria-label` prop
Env var `\|\| ''` fallback	Throw if missing, fallback only with NODE_ENV check
Middleware blocks new route	Add to PUBLIC_PREFIXES or route matcher
Font declared but not loaded	Add `next/font` import in layout.tsx
`hsl(var(--x))` double-wrap	Remove outer `hsl()` when CSS var already contains it
Stock shadcn tokens	Read project's globals.css, use actual brand colors

Commit Cadence

Commit every 3 completed tasks
Or after major milestones
Feature branch for team projects; main is fine for solo (see commit skill)
Use conventional commits: feat|fix|refactor

Save Project Knowledge (Continuous Learning)

After solving hard problems (debugging, retries, unexpected errors), save reusable lessons to auto-memory:

What to Save	Example
Environment quirks	"This project uses Vite on port 5173, not CRA on 3000"
Error fix recipes	"RLS 'permission denied' → check auth.uid() in policy, not custom function"
Architecture patterns	"API routes follow /api/v1/[resource]/route.ts pattern"
Build gotchas	"Must run `npm run generate` before build (Prisma client)"
Test setup	"Tests need `TEST_DB_URL` env var, seed with `npm run seed:test`"
Deploy requirements	"Vercel needs `ANALYZE=true` for bundle analysis"

Also save after these events:

Same error 3+ times across tasks → save as known pattern with fix recipe
Unexpected project structure → save the actual structure for next session
Workarounds discovered → save so next session doesn't rediscover them

This builds per-project context that compounds across sessions.

Token Management

With 1M context, compaction is almost never needed. Do NOT suggest /compact unless you are certain context usage exceeds 70%. A full sprint (10+ tasks) typically uses only 15-20% of 1M context.

Be concise but don't sacrifice clarity for brevity.

Auto-Deploy (After Commit)

After committing completed tasks, check if changed files need deployment:

# Check what changed since last deploy/commit
git diff --name-only HEAD~1

Changed Files	Deploy Action
`supabase/functions/*/index.ts`	Deploy changed edge functions (read deploy command from project CLAUDE.md)
`supabase/migrations/*.sql`	Run `supabase db push` or apply migration
`src/**` (Vercel/Next.js)	Push to trigger Vercel auto-deploy

For edge functions, read project-specific deploy config from CLAUDE.md (e.g., path to supabase binary, project ref, flags like --no-verify-jwt). If no config found, skip auto-deploy and note it in completion summary.

After deploy, verify the deployment succeeded (check endpoint responds with 200).

Completion

When all stories have passes === true:

All [N] tasks complete.

Summary:
- [X] features implemented
- [X] bugs fixed
- [X] improvements made

Run `progress` to see full results.

IDLE Detection (Smart Next Action)

If no tasks to work on:

Are all stories passes: true?
- No: find blocked tasks and resolve blockers
- Yes: continue to step 2
Auto-transition sprint (see below)
Output completion summary
Assess context to decide next action

Auto Sprint Transition

When all pending tasks are done, auto handles the sprint lifecycle — but verifies the work first and surfaces a summary before bumping.

1. BUILD GATE — run the real deploy-target build before anything else:
   npm run build  (or pnpm/yarn/bun equivalent)
   If it fails, do NOT archive or bump. Create a prd.json story for each
   error and continue working. Sprint can only close on a clean build.

2. Log summary to .claude/sprint-history.md:
   "Sprint [N]: [done]/[total] tasks | [date] | [one-line summary of work]"

3. Archive completed stories:
   - Copy current prd.json to .claude/archives/prd-archive-sprint-[N].json
   - Remove stories with passes: true from prd.json
   - Keep stories with passes: null, false, or "deferred"

4. Decide whether to bump — show a one-line honesty summary first:

   Sprint [N] closed: [done]/[total] tasks.
   Average realness: [avg]%  (see realness field in core schema)
   Build: passed in [T]s
   Carried forward: [M] deferred, [K] new findings
   Bumping to Sprint [N+1]. Say "stop" to pause.

   Then proceed — no confirmation needed, but the user has a clean
   window to interrupt. This beats silent bumps AND beats blocking
   prompts that break autonomous execution.

5. If no new work exists, skip the bump and go to "Ask User" below.

Honest close criteria: A sprint doesn't close just because every story has passes: true. It closes when (a) build passes on deploy target, (b) the summary accurately reflects what shipped, and (c) realness scores are filled in honestly.

Decision Matrix

Signal	Action
Deferred tasks from previous sprint	Carry forward, start working
Audit/brainstorm created new stories	Bump sprint, continue
Dev server running + UI changes made	Run visual scan, fix issues found
TODOs/FIXMEs in changed files	Create stories, fix them
Build warnings	Fix directly (no story needed)
Clean codebase, no work	Ask user (see below)

Auto-Continue (Obvious Work)

When new work exists after sprint transition, continue immediately:

Sprint [N] complete ([done]/[total] tasks).
Archived completed stories. [M] tasks carried forward.
Continuing as Sprint [N+1].

Limit: 2 auto-continued sprints per session. After that, ask the user.

Ask User (No Obvious Work or Limit Reached)

If the sprint had 5+ tasks, suggest simplify first:

Sprint [N] complete ([done]/[total] tasks).

Recommended: run `simplify` to catch duplicate code from this sprint.

What's next?
1. simplify - Review for duplicate code and over-abstraction
2. audit - Deep quality scan (finds bugs + violations)
3. brainstorm - Feature ideas + dead code scan
4. Done for now

Keep .claude/auto-active flag while asking. Only delete it if user picks "Done for now".

Quick Reference

Situation	Action
No prd.json	Bootstrap from context
All done + issues found	Brainstorm (auto-creates stories)
All done + clean code	Ask user for next action
All done + already auto-sprinted	Ask user (limit reached)
Build broken	Fix first
Task fails	Retry 2x, then skip
UX task	Browser verify
Blocked task	Skip, work on unblocked
< 5 tasks, no sprint	Work directly

Name	auto
Description	Autonomous task execution with testing and security. Works through all tasks without stopping.

auto

SKILL.md