name: code-review version: 2.3.0 description: '[Code Quality] Use when evaluating review feedback, requesting targeted code-quality review, or verifying completion claims.' execution-mode: subagent context-budget: critical

[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval. [BLOCKING] Before each step or sub-skill call, update task tracking: set in_progress when step starts, set completed when step ends. [BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason. [BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.

Quick Summary

Goal: Ensure technical correctness: receiving feedback with verification (not performative agreement), requesting targeted systematic reviews via code-reviewer subagent, enforcing verification gates before completion claims.

Routing boundary: If the user asks to review current changes, uncommitted work, staged/unstaged diffs, or a branch-to-branch diff, use review-changes instead.

MANDATORY MUST ATTENTION Before reviewing, search for project-specific reference docs:

Coding standards — search: code-review-rules, coding-standards, style-guide, contributing Architecture — search: patterns-reference, architecture, adr Test conventions — search: integration-test-reference, test-guide, test-conventions Design system — search: design-system, design-tokens, component-library

Read found docs before reviewing. None found → rely on tech stack knowledge from file extensions/directory structure.

Workflow:

Create Review Report — Init plans/reports/code-review-{date}-{slug}.md
Phase 0: Blast Radius — Run graph analysis first if .code-graph/graph.db exists
Phase 0.3: Risk Detection — Detect dependency, migration, bus/event, API, security, config, and infra risks
Phase 0.5: Plan Compliance — Verify changed files and tests against active plan when present
Phase 0.7: Surface Detection — Classify files by language + directory semantics + change nature → route sub-agents
Phase 1: File-by-File — Review each file, update report with correctness, convention, DRY, intent, test, and docs checks
Phase 2: Holistic — Re-read accumulated report, assess overall approach, architecture, duplication, and cross-boundary behavior
Phase 3: Final Result — Update report with overall assessment, critical issues, recommendations, docs staleness, and test gaps
Round 2: Fresh Sub-Agent — After a fix cycle, run a fresh code-reviewer for cross-cutting concerns, convention drift, and edge cases

Key Rules:

Report-Driven: Build report incrementally; re-read for big picture
Detect First: Run graph blast radius when available, then classify change types and file surfaces before any review
No Performative Agreement: Technical evaluation only ("You're right!" banned)
Verification Gates: Evidence required before completion claims
Review Current Diffs Elsewhere: Current changes, staged/unstaged diffs, and branch diffs belong to review-changes
A clean Round 1 ENDS the review. Spawn a fresh sub-agent for Round 2 ONLY after a fix cycle.

Code Review

Three practices: receiving feedback with technical rigor, requesting systematic reviews via code-reviewer subagent, enforcing verification gates before completion claims.

Run python .claude/scripts/code_graph query tests_for <function> --json on changed functions to flag coverage gaps.

Review Mindset (NON-NEGOTIABLE)

Skeptical. Every claim needs traced proof file:line. Confidence >80% to act.

NEVER accept code correctness at face value — trace call paths
NEVER include finding without file:line evidence (grep results, read confirmations)
ALWAYS question: "Does this actually work?" → trace it. "Is this all?" → grep cross-service
ALWAYS verify side effects: check consumers + dependents before approving

Core Principles (ENFORCE ALL)

Principle	Rule
YAGNI	Flag code solving hypothetical problems (unused params, speculative interfaces)
KISS	Flag unnecessary complexity. "Is there a simpler way?"
DRY	Grep for similar/duplicate code. 3+ similar patterns → flag for extraction
Clean Code	Readable > clever. Names reveal intent. Functions do ONE thing. Nesting <=3. Methods <30 lines
Convention	MUST ATTENTION grep 3+ existing examples before flagging violations. Codebase convention wins over textbook
No Bugs	Trace logic paths. Verify edge cases (null, empty, boundary). Check error handling
Proof Required	Every claim backed by `file:line` evidence. Speculation is forbidden
Doc Staleness	Cross-ref changed files against related docs. Flag stale/missing updates

Technical correctness over social comfort. Verify before implementing. Evidence before claims.

Graph-Enhanced Review (RECOMMENDED if graph.db exists)

python .claude/scripts/code_graph graph-blast-radius --json — prioritize files by impact (most dependents first)
python .claude/scripts/code_graph query tests_for <function_name> --json — flag untested changed functions
python .claude/scripts/code_graph trace <file> --direction downstream --json — downstream impact (events, bus, cross-service)
python .claude/scripts/code_graph trace <file> --direction both --json — full flow context for controllers/commands/handlers
Wide blast radius (>20 impacted nodes) = high-risk. Flag in report.

Review Approach (Report-Driven Two-Phase — CRITICAL)

MANDATORY FIRST: Create Todo Tasks

Task	Status
`[Review] Create report file`	in_progress
`[Review Phase 0] Run graph blast-radius if available`	pending
`[Review Phase 0.3] Detect high-risk change types`	pending
`[Review Phase 0.5] Plan compliance check (skip if no active plan)`	pending
`[Review Phase 0.7] Detect categories + route sub-agents`	pending
`[Review Phase 1] File-by-file review + update report`	pending
`[Review Phase 2] Holistic assessment`	pending
`[Review Phase 3] Final findings, docs triage, and test sync findings`	pending
`[Review Round 2] Fresh sub-agent re-review after fix cycle`	pending
`[Review Final] Consolidate all rounds`	pending

Step 0: Create Report File

Create plans/reports/code-review-{date}-{slug}.md with Scope, Files to Review sections.

Phase 0: Graph Blast Radius (FIRST WHEN AVAILABLE)

If .code-graph/graph.db exists, run graph impact analysis before reviewing:

python .claude/scripts/code_graph graph-blast-radius --json or the project equivalent
Record impacted files count, untested changed functions, and risk level in the report
Prioritize high-impact files during Phase 1

If graph data is unavailable, record "Graph not available — skipping blast radius" and continue.

Phase 0.3: Detect High-Risk Change Types

Before file review, inspect the target diff or explicit file set for:

Dependency upgrades — semver, breaking changes, advisories, peer compatibility
Migrations or schema changes — rollback, lock/volume impact, zero-downtime deployment, idempotent backfill
Bus events/messages — consumer existence, idempotency, retries, poison/dead-letter handling
API contract changes — backward compatibility, caller alignment, auth, required response fields
Security changes — enforcement coverage, privilege escalation, negative tests, duplicated permission strings
Config/env changes — all environments covered, no secrets, fail-fast behavior, setup docs
Infra changes — dev/prod parity, pinned versions, CI/CD permissions, reproducible builds

Create focused review tasks for every true signal and complete them before dimensional review.

Phase 0.5: Plan Compliance Check (CONDITIONAL)

If active plan context exists, verify scope, test evidence, and success criteria against the plan before file review; otherwise record the skip reason.

Phase 0.7: Detect Review Categories

Before any review — classify the changeset and route sub-agents:

Signal in changed files	Route to
Auth/permission/token/encryption files	`security-auditor`
Query files, caching, batch processing	`performance-optimizer`
Source code (logic, handlers, services)	`code-reviewer`
Docs, plans, specs, markdown	`general-purpose`
Mixed changeset with security/perf files	Spawn specialized sub-agent first, then `code-reviewer`

Phase 0.7: Derive Review Categories

Group changed files by: file language (extension), directory semantics (path), change nature (new entity, schema, config, UI, test).

For each category: name it, create sub-task, derive concerns using SYNC:category-review-thinking (first principles — NOT a fixed checklist).

Category list = Phase 1 work breakdown. Each category → own section in report.

Phase 1: File-by-File Review (Build Report)

For EACH file, immediately update report:

File path, Change Summary, Purpose, Issues Found
Convention check: Grep 3+ similar patterns — does new code follow existing convention?
Correctness check: Trace logic — null, empty, boundary, error cases handled?
DRY check: Grep for similar/duplicate code — does this logic exist elsewhere?
Intention check: Does the change serve the stated purpose? Flag unrelated modifications
Test check: Changed behavior has corresponding test/spec coverage or a documented gap
Documentation check: Related docs/specs/READMEs still match the changed behavior

Phase 2: Holistic Review (Re-read Report)

After all files reviewed, re-read accumulated report:

Technical Solution: Overall approach coherent as unified plan?
Responsibility: Logic in LOWEST layer? Business logic not in controllers?
Data ownership: Constants/config in model/entity, not controller/component?
Duplication: Grep to verify — duplicated logic across changes?
Architecture: Clean Architecture? Service boundaries respected?
Plan Compliance: If active plan → check ## Plan Context: impl matches requirements, TCs have code evidence (not "TBD"), no requirement unaddressed
Design Patterns: Pattern opportunities (switch→Strategy)? Anti-patterns (God Object, Copy-Paste, Circular Dep)? DRY via base classes?
Cross-Boundary Behavior: Callers/callees aligned? API/event contracts consistent? New wiring reachable?
Test Sync: Business logic changes have corresponding tests or explicit user-facing gap
Translation Sync: Multilingual UI text changes have translation updates or explicit risk acceptance

MUST ATTENTION CHECK — Clean Code: YAGNI (unused params, speculative interfaces)? KISS (simpler exists)? Methods >30 lines or nesting >3?

MUST ATTENTION CHECK — Correctness: Null/empty/boundary handled? Error paths caught? Async race conditions? Trace happy + error paths.

Documentation Staleness Check:

For each changed file — grep file name/module across docs/ and AI tooling dirs. Changed behavior → flag stale doc (specific section + what changed). Do NOT auto-fix — flag only.

Common staleness patterns: count/limit changed → docs embedding that number | API/contract changed → API usage docs | hook/skill added/removed → catalogs/README | schema changed → entity reference docs.

Phase 3: Final Review Result

Update report: Overall Assessment, Critical Issues, High Priority, Architecture Recommendations, Documentation Staleness, Positive Observations.

If documentation staleness is detected, recommend docs-update and list exact stale sections; do not silently pass stale docs.

Round 2+: Fresh Sub-Agent Re-Review (MANDATORY)

After Phase 3 (Round 1), spawn fresh code-reviewer sub-agent for Round 2 using canonical template from SYNC:review-protocol-injection:

Copy Agent call shape from SYNC:review-protocol-injection verbatim
Embed full verbatim body of all 10 SYNC blocks: SYNC:evidence-based-reasoning, SYNC:bug-detection, SYNC:design-patterns-quality, SYNC:complexity-prevention, SYNC:logic-and-intention-review, SYNC:test-spec-verification, SYNC:fix-layer-accountability, SYNC:rationalization-prevention, SYNC:graph-assisted-investigation, SYNC:understand-code-first
Task: "Review the assigned code-review scope. Focus: cross-cutting concerns, interaction bugs, convention drift, missing pieces, subtle edge cases, logic errors, test spec gaps."
Target Files: "use the explicit files, plan scope, or reviewer-provided target range"
Report: plans/reports/code-review-round{N}-{date}.md

After sub-agent returns:

Read report from plans/reports/code-review-round{N}-{date}.md
Integrate findings as ## Round {N} Findings (Fresh Sub-Agent) — DO NOT filter or override
If FAIL: fix issues → spawn NEW Round N+1 fresh sub-agent (never reuse)
Max 3 fresh rounds — escalate via AskUserQuestion if still failing after 3 rounds

Clean Code Rules (MUST ATTENTION CHECK)

#	Rule	Details
1	No Magic Values	All literals → named constants
2	Type Annotations	Explicit parameter and return types on all functions
3	Single Responsibility	One concern per method/class. Event handlers/consumers: one handler = one concern. NEVER bundle — platform swallows exceptions silently
4	DRY	No duplication; extract shared logic
5	Naming	Specific (`employeeRecords` not `data`), Verb+Noun methods, is/has/can/should booleans, no abbreviations
6	Performance	No O(n²) (use dictionary). Project in query (not load-all). ALWAYS paginate. Batch-by-IDs (not N+1)
7	Entity Indexes	Collections: index management methods. EF Core: composite indexes. Expression fields match index order. Text search → text indexes

Data Lifecycle Rules (MUST ATTENTION CHECK)

Decision test: "Delete the DB and start fresh — does this data still need to exist?" Yes → Seeder/fixture. No → Migration.

Type	Contains	NEVER contains
Seeder / Fixture	Default records, system config, reference data (idempotent — safe to run every startup)	Schema changes
Migration	Schema changes, column adds/removes, data transforms, index changes	Default records, permission seeds, system config

Apply project's language/framework conventions. Principle universal — implementation project-specific.

Legacy Pattern Compliance

When reviewing files with legacy and modern patterns:

Detect legacy signals — search project-config.json, package.json, or equivalent for "legacy", version flags, feature annotations
Read what "legacy" means — grep 3+ legacy files to understand pattern constraints vs. modern files
Derive compliance rules — what lifecycle/memory management differences exist between legacy/modern for this tech stack?
Apply tech stack knowledge to flag anti-patterns

NEVER assume any specific framework's lifecycle. Derive from codebase evidence.

When to Use This Skill

Practice	Triggers	MUST ATTENTION READ
Receiving Feedback	Review comments received, feedback unclear/questionable, conflicts with existing decisions	`references/code-review-reception.md`
Requesting Review	After each subagent task, major feature done, targeted review scope, after complex bug fix	`references/requesting-code-review.md`
Verification Gates	Before any completion claim, commit, push, or PR. ANY success/satisfaction statement	`references/verification-before-completion.md`

Quick Decision Tree

SITUATION?
│
├─ Received feedback
│  ├─ Unclear items? → STOP, ask for clarification first
│  ├─ From human partner? → Understand, then implement
│  └─ From external reviewer? → Verify technically before implementing
│
├─ Completed work
│  ├─ Major feature/task? → Request code-reviewer subagent review
│  └─ Before merge? → Request code-reviewer subagent review
│
└─ About to claim status
   ├─ Have fresh verification? → State claim WITH evidence
   └─ No fresh verification? → RUN verification command first

Receiving Feedback Protocol

Pattern: READ → UNDERSTAND → VERIFY → EVALUATE → RESPOND → IMPLEMENT

NEVER use performative agreement ("You're right!", "Great point!", "Thanks for...")
NEVER implement before verification
MUST ATTENTION restate requirement, ask questions, or push back with technical reasoning
MUST ATTENTION ask for clarification on ALL unclear items BEFORE starting
MUST ATTENTION grep for usage before implementing suggested "proper" features (YAGNI check)

Source handling: Human partner → implement after understanding. External reviewer → verify technically, push back if wrong.

Full protocol: references/code-review-reception.md

Requesting Review Protocol

Get git SHAs: BASE_SHA=$(git rev-parse HEAD~1) and HEAD_SHA=$(git rev-parse HEAD)
Dispatch code-reviewer subagent with: WHAT_WAS_IMPLEMENTED, PLAN_OR_REQUIREMENTS, BASE_SHA, HEAD_SHA, DESCRIPTION
Act on feedback: Critical → fix immediately. Important → fix before proceeding. Minor → note for later.

Full protocol: references/requesting-code-review.md

Verification Gates Protocol

Iron Law: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

Gate: IDENTIFY command → RUN it → READ output → VERIFY it confirms claim → THEN claim. Skip any step = lying.

Claim	Required Evidence
Tests pass	Test output shows 0 failures
Build succeeds	Build command exit 0
Bug fixed	Original symptom test passes
Requirements met	Line-by-line checklist verified

Red Flags — STOP: "should"/"probably"/"seems to", satisfaction before verification, committing without verification, trusting agent reports.

Full protocol: references/verification-before-completion.md

code-simplifier
debug-investigate
refactoring

Systematic Review Protocol (10+ changed files)

For large changesets: categorize files by concern → fire parallel code-reviewer sub-agents per category → synchronize findings → holistic assessment. See review-changes/SKILL.md § "Systematic Review Protocol" for full 4-step protocol.

Workflow Recommendation

MANDATORY MUST ATTENTION — NO EXCEPTIONS: If NOT already in a workflow, use AskUserQuestion to ask user:

Activate quality-audit workflow (Recommended) — code-review → plan → code → review-changes → test

Execute /code-review directly — run standalone

Architecture Boundary Check

For each changed file, verify no forbidden layer imports:

Read rules from docs/project-config.json → architectureRules.layerBoundaries
Determine layer — match file path against each rule's paths glob patterns
Scan imports — grep for using (C#) or import (TS) statements
Check violations — import path contains forbidden layer name → violation
Exclude framework — skip files matching architectureRules.excludePatterns
BLOCK on violation — "BLOCKED: {layer} layer file {filePath} imports from {forbiddenLayer} ({importStatement})"

If architectureRules absent in project-config.json → skip silently.

Phase 4: Why-Review Self-Validation Gate (MANDATORY when findings exist)

Purpose: Adversarial validation of own findings BEFORE handoff. Catches over-flagged Highs, false positives, and severity inflation at the source rather than letting them propagate downstream.

Trigger: Any finding produced (Critical, High, Medium, OR Low). Skip ONLY when the report's verdict is unconditional PASS with literally zero findings.

Protocol:

Read own finalized report from plans/reports/{skill}-{date}-{slug}.md
Invoke /why-review skill with arg: validate findings in plans/reports/{skill}-{date}-{slug}.md — verify each finding has file:line proof, steel-man each rejected interpretation, and stress-test severity classifications
Read why-review output from plans/reports/why-review-{date}.md
If why-review demotes/removes any finding: UPDATE own finalized report with revised severities, remove false positives, and add a ## Why-Review Validation Notes section citing what changed and why
If why-review confirms all findings: Append ## Why-Review Validation line to own report stating "All N findings re-validated against actual code; no severity changes."

Skip conditions (record explicit reason if skipping):

Verdict is unconditional PASS with zero findings → log "Skipped — no findings to validate"
Why-review skill itself is the active context (avoid recursion)

Why this exists: AI sub-agent reports inherit confirmation bias — the orchestrator absorbs severity claims as ground truth. The 2026-05-09 review incident produced 5 Highs; adversarial validation demoted 3 of them. Codify this as standard practice.

Next Steps

MANDATORY MUST ATTENTION — NO EXCEPTIONS after completing, use AskUserQuestion:

"/fix (Recommended)" — review found issues needing fixes
"/watzup" — review clean, wrap up session
"Skip, continue manually" — user decides

AI Agent Integrity Gate (NON-NEGOTIABLE)

Completion ≠ Correctness. Before reporting ANY work done:

Grep every removed name. Extraction/rename/delete → grep confirms 0 dangling refs across ALL file types.
Ask WHY before changing. Existing values intentional until proven otherwise.
Verify ALL outputs. One build passing ≠ all builds passing.
Evaluate pattern fit. Copying nearby code? Verify preconditions match — scope, lifetime, base class, constraints.
New artifact = wired artifact. Created something? Prove it's registered, imported, reachable by all consumers.

[IMPORTANT] Use TaskCreate to break ALL work into small tasks BEFORE starting — including tasks for each file read. This prevents context loss from long files. For simple tasks, AI MUST ATTENTION ask user whether to skip.

Critical Purpose: Ensure quality — no flaws, bugs, missing updates, stale content. Verify code AND documentation.

External Memory: Complex work → write findings incrementally to plans/reports/ — prevents context loss, serves as deliverable.

Evidence Gate: MANDATORY MUST ATTENTION — every claim, finding, recommendation requires file:line proof + confidence % (>80% act, <80% verify first).

OOP & DRY: MANDATORY MUST ATTENTION — flag patterns extractable to base class/generic/helper. Same-suffix/lifecycle/responsibility classes MUST ATTENTION share common base. Apply idiomatic abstraction (base class, mixin, trait, protocol) for project's language. Verify linting/analyzer configured.

Graph-Assisted Investigation — MANDATORY when .code-graph/graph.db exists.

HARD-GATE: MUST ATTENTION run at least ONE graph command on key files before concluding any investigation.

Pattern: Grep finds files → trace --direction both reveals full system flow → Grep verifies details

Task Minimum Graph Action
Investigation/Scout trace --direction both on 2-3 entry files
Fix/Debug callers_of on buggy function + tests_for
Feature/Enhancement connections on files to be modified
Code Review tests_for on changed functions
Blast Radius trace --direction downstream

CLI: python .claude/scripts/code_graph {command} --json. Use --node-mode file first (10-30x less noise), then --node-mode function for detail.

Task	Minimum Graph Action
Investigation/Scout	`trace --direction both` on 2-3 entry files
Fix/Debug	`callers_of` on buggy function + `tests_for`
Feature/Enhancement	`connections` on files to be modified
Code Review	`tests_for` on changed functions
Blast Radius	`trace --direction downstream`

Category Review Thinking — For each category of changed files, think from first principles. Do NOT use a fixed checklist — derive concerns based on the category's domain.

Step 1: Understand the category's role What is this category responsible for? What are its invariants? Who are its consumers (callers, dependents, downstream systems)?

Step 2: Read project conventions for this category Grep 3+ existing similar files in this category. What patterns do they follow? What base classes/interfaces/abstractions do they use?

Step 3: Derive concerns from first principles Given the category's role and invariants, what could go wrong? Start from universal concerns, then expand with category-specific knowledge:

Correctness: Does the change do what it claims? Are contracts maintained?

Contracts: Does the change preserve consumer-facing behavior?

Security: What trust assumptions does this category make? Are they still valid?

Performance: Does the change introduce O(n²), unbounded queries, or unnecessary I/O?

Maintainability: Does the change follow existing patterns? Does it introduce hidden coupling?

Tests: Is the changed behavior observable and testable?

Documentation: Does the change invalidate any existing docs or specs?

These are starting points — your domain knowledge of the tech stack should expand this list. Do NOT limit yourself to what's listed above.

Step 4: Create sub-tasks and execute with file:line evidence Convert derived concerns into concrete review tasks. Each task must produce file:line evidence. No findings without proof.

Examples of categories (illustrative — NOT exhaustive):

Logic/domain files (business rules, handlers, services)

Data/schema files (migrations, models, ORM definitions)

API/contract files (controllers, routes, serializers, proto definitions)

Configuration/environment files (env vars, feature flags, secrets)

Infrastructure files (Dockerfiles, CI pipelines, manifests)

UI/style files (components, templates, stylesheets)

Test files (unit, integration, e2e)

Documentation files (markdown, specs, ADRs)

Security artifacts (auth middleware, permission definitions, crypto)

Tooling/build files (build configs, linting rules, dependency manifests)

Sub-Agent Return Contract — When this skill spawns a sub-agent, the sub-agent MUST return ONLY this structure. Main agent reads only this summary — NEVER requests full sub-agent output inline.
## Sub-Agent Result: [skill-name]

Status: ✅ PASS | ⚠️ PARTIAL | ❌ FAIL
Confidence: [0-100]%

### Findings (Critical/High only — max 10 bullets)

- [severity] [file:line] [finding]

### Actions Taken

- [file changed] [what changed]

### Blockers (if any)

- [blocker description]

Full report: plans/reports/[skill-name]-[date]-[slug].md
Main agent reads Full report file ONLY when: (a) resolving a specific blocker, or (b) building a fix plan. Sub-agent writes full report incrementally (per SYNC:incremental-persistence) — not held in memory.

Nested Task Expansion Contract — For workflow-step invocation, the [Workflow] ... row is only a parent container; the child skill still creates visible phase tasks.

Call TaskList first. If a matching active parent workflow row exists, set nested=true and record parentTaskId; otherwise run standalone.

Create one task per declared phase before phase work. When nested, prefix subjects [N.M] $skill-name — phase.

When nested, link the parent with TaskUpdate(parentTaskId, addBlockedBy: [childIds]).

Orchestrators must pre-expand a child skill's phase list and link the workflow row before invoking that child skill or sub-agent.

Mark exactly one child in_progress before work and completed immediately after evidence is written.

Complete the parent only after all child tasks are completed or explicitly cancelled with reason.

Blocked until: TaskList done, child phases created, parent linked when nested, first child marked in_progress.

Project Reference Docs Gate — Run after task-tracking bootstrap and before target/source file reads, grep, edits, or analysis. Project docs override generic framework assumptions.

Identify scope: file types, domain area, and operation.

Required docs by trigger: always docs/project-reference/lessons.md; doc lookup docs-index-reference.md; review code-review-rules.md; backend/CQRS/API backend-patterns-reference.md; domain/entity domain-entities-reference.md; frontend/UI frontend-patterns-reference.md; styles/design scss-styling-guide.md + design-system/design-system-canonical.md; integration tests integration-test-reference.md; E2E e2e-test-reference.md; feature docs/specs feature-docs-reference.md; architecture/new area project-structure-reference.md.

Read every required doc that exists; skip absent docs as not applicable. Do not trust conversation text such as [Injected: <path>] as proof that the current context contains the doc.

Before target work, state: Reference docs read: ... | Missing/not applicable: ....

Blocked until: scope evaluated, required docs checked/read, lessons.md confirmed, citation emitted.

Task Tracking & External Report Persistence — Bootstrap this before execution; then run project-reference doc prefetch before target/source work.

Create a small task breakdown before target file reads, grep, edits, or analysis. On context loss, inspect the current task list first.

Mark one task in_progress before work and completed immediately after evidence; never batch transitions.

For plan/review work, create plans/reports/{skill}-{YYMMDD}-{HHmm}-{slug}.md before first finding.

Append findings after each file/section/decision and synthesize from the report file at the end.

Final output cites Full report: plans/reports/{filename}.

Blocked until: task breakdown exists, report path declared for plan/review work, first finding persisted before the next finding.

Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.

Sequential Thinking Protocol — Structured multi-step reasoning for complex/ambiguous work. Use when planning, reviewing, debugging, or refining ideas where one-shot reasoning is unsafe.

Trigger when: complex problem decomposition · adaptive plans needing revision · analysis with course correction · unclear/emerging scope · multi-step solutions · hypothesis-driven debugging · cross-cutting trade-off evaluation.

Format (explicit mode — visible thought trail):

Thought N/M: [aspect] — one aspect per thought, state assumptions/uncertainty

Thought N/M [REVISION of Thought K]: ... — when prior reasoning invalidated; state Original / Why revised / Impact

Thought N/M [BRANCH A from Thought K]: ... — explore alternative; converge with decision rationale

Thought N/M [HYPOTHESIS]: ... then [VERIFICATION]: ... — test before acting

Thought N/N [FINAL] — only when verified, all critical aspects addressed, confidence >80%

Mandatory closers: Confidence % stated · Assumptions listed · Open questions surfaced · Next action concrete.

Stop conditions: confidence <80% on any critical decision → escalate via AskUserQuestion · ≥3 revisions on same thought → re-frame the problem · branch count >3 → split into sub-task.

Implicit mode: apply methodology internally without visible markers when adding markers would clutter the response (routine work where reasoning aids accuracy).

Deep-dive: see /sequential-thinking skill (.claude/skills/sequential-thinking/SKILL.md) for worked examples (api-design, debug, architecture), advanced techniques (spiral refinement, hypothesis testing, convergence), and meta-strategies (uncertainty handling, revision cascades).

Evidence-Based Reasoning — Speculation is FORBIDDEN. Every claim needs proof.

Cite file:line, grep results, or framework docs for EVERY claim

Declare confidence: >80% act freely, 60-80% verify first, <60% DO NOT recommend

Cross-service validation required for architectural changes

"I don't have enough evidence" is valid and expected output

BLOCKED until: - [ ] Evidence file path (file:line) - [ ] Grep search performed - [ ] 3+ similar patterns found - [ ] Confidence level stated

Forbidden without proof: "obviously", "I think", "should be", "probably", "this is because" If incomplete → output: "Insufficient evidence. Verified: [...]. Not verified: [...]."

Design Patterns Quality — Priority checks for every code change:

DRY via OOP: Identify classes/modules with the same purpose, naming pattern, or lifecycle. Apply your knowledge of the project's language/framework to determine the idiomatic abstraction (base class, mixin, trait, protocol, decorator). 3+ similar patterns → extract to shared abstraction.

Right Responsibility: Logic in LOWEST layer (Entity > Domain Service > Application Service > Controller). Never business logic in controllers.

SOLID: Single responsibility (one reason to change). Open-closed (extend, don't modify). Liskov (subtypes substitutable). Interface segregation (small interfaces). Dependency inversion (depend on abstractions).

After extraction/move/rename: Grep ENTIRE scope for dangling references. Zero tolerance.

YAGNI gate: NEVER recommend patterns unless 3+ occurrences exist. Don't extract for hypothetical future use.

Anti-patterns to flag: God Object, Copy-Paste inheritance, Circular Dependency, Leaky Abstraction.

Serial Attention for Design Quality — DO NOT scan all quality concerns simultaneously. Split attention misses violations that focused passes catch.

Identify applicable dimensions — Based on the code's language, domain, and patterns, determine which quality dimensions apply: DRY, SOLID principles (SRP/OCP/LSP/ISP/DIP), OOP idioms, cohesion/coupling, GRASP, Law of Demeter, CQRS invariants, etc. Your list is NOT fixed — derive from what the code actually does.

One focused pass per dimension — Dedicate single-focus attention to EACH dimension in sequence. Do NOT mix concerns across passes.

Threshold: 3+ similar patterns = MANDATORY extraction — Not optional suggestion. Flag as mandatory structural fix requiring action.

2+ violations of same kind = structural finding — Report as "pattern problem" needing architectural resolution, not a list of individual instances.

Complexity Prevention (Ousterhout) — MANDATORY. Measure code by cost of change: one business change should map to one code change. Flag ALL of the following in review:

Change amplification — small business change forces edits in >3 places → structural flaw. Count edit sites for a plausible future change (add variant, add field, add authorization). >3 = reject.

Cognitive load — reader must hold too much context to safely modify. Flag deep inheritance, long parameter lists, boolean traps, implicit ordering dependencies.

Cross-cutting duplication at entry points — logging, error handling, validation, auth, transactions reimplemented per controller/handler/route. Lift to middleware / interceptor / filter / decorator / aspect.

Leaked implementation technology — repos returning IQueryable/QuerySet/Criteria/raw cursors/ORM entities to callers. Return finished results + intent-revealing methods (GetActiveVipUsers() not Query()).

Type-switch scattering — switch/if-chains on enum/discriminator in >1 place. New variant = new file, not N edits. One factory/registry switch at the boundary OK; scattered switches = reject.

Anemic models — domain objects with only getters/setters, logic floats in services. Move invariants/behavior onto the object (order.Checkout(), not order.Status = ...).

Primitive obsession — raw string/int/decimal for account numbers, emails, money, percentages, date ranges, with re-validation at every entry. Wrap in value objects / records / structs that validate once at construction.

Inline cross-cutting concerns — authorization/tenant isolation/audit/sanitization hand-written at top of every handler. Flag intent with declarative markers (@RequirePermission("Order.Delete")), enforce once centrally.

Shallow modules — tiny class, big interface (many public methods, many flags, many ctor params) wrapping little logic. A module is deep when a small interface hides a lot of implementation. If interface ≈ implementation cost to learn → inline.

Missing base class for repeated component/handler lifecycle — 3+ forms/CRUD handlers/list views reimplementing loading/dirty/submit/pagination → extract to base class / hook / composable / mixin / trait.

Premature vs delayed abstraction — rule-of-three. First occurrence: write it. Second: notice duplication. Third: extract. Don't build generic frameworks before real variation; don't copy-paste for the 4th time.

Embedded utility logic not extracted to helpers — inline paging loops (while (hasMore) { skip += take; ... }), ad-hoc datetime math, string parsing/formatting, collection partitioning, retry/backoff loops, URL/query-string building. If the algorithm is non-trivial AND stack-generic (not business-specific), extract to util/helper/extensions and let consumers call one line. Inline duplicates → duplicated bug surface.

Logic in wrong (higher) layer — downshift to callee — business/derivation logic written in the caller when the callee owns the data. Defaults: Controller code that should be App Service. App Service code that should be Domain Service or Entity. Component code that should be ViewModel/Store/Service. Caller reaching into callee's data shape to compute something → move the computation behind an intent-revealing method on the callee. Lowest responsible layer wins (Entity > Domain Service > App Service > Controller · Model/VM > Store > Component). Higher-layer placement = duplicated logic when a sibling caller needs the same thing.

Owner owns the rule — extract on first write — if a caller inlines logic that derives, normalizes, validates, or computes from another type's data, MOVE it to the owning type. Single use is sufficient — the trigger is wrong responsibility, not duplication. Sibling callers always arrive; inline copies drift silently with no compile error and no name to grep. Common offenders: Backend — inlined rules in application-layer handlers / commands / queries / services / controllers that belong on the domain entity / value object / domain service. Frontend — inlined derivations / formatting / validation in components that belong on the model / store / view-model / API service. Fix: name the rule once as a method (static or instance) on the owning type; callers invoke by name. Future variant → SECOND named method on the owner, never an inline near-duplicate. Right responsibility first; reuse is the consequence.

Extraction target — where the named rule lives:

Shape of the rule Goes to
Pure function over an entity's own data static method on the entity
Behavior that mutates / guards entity state instance method on the entity
Always-true invariant on a primitive value value object constructor
Needs DI (repo / settings / clock) helper class registered in DI
Domain-agnostic algorithm reused across types util / extension method
Pure shape / projection conversion DTO mapping

Pre-commit edit-site test (reject if answer is "many"):

Change Scenario Should touch
Add new variant (customer type, payment method) 1 new file
Change HTTP error response format 1 middleware/filter
Add timestamp field to every persisted entity 1 base entity/interceptor
Add authorization to a new endpoint 1 declarative marker
Swap database/ORM Data layer only
Change business calculation rule 1 method on owning entity
Add loading indicator pattern to forms 1 base component/hook
Add validation rule to a domain primitive 1 value-object ctor
Change paging/retry/datetime algorithm 1 helper/util function
Change a derivation of entity data 1 method on the entity

Operating heuristics:

Write the call site first.

Count edit sites for plausible future change.

Prefer removing code over adding it.

Surface assumptions at boundaries, hide details inside.

Pre-reuse scan — before writing a non-trivial block, grep for similar algorithms (while.*skip, DateTime.*Add, split/join chains, paging loops, retry loops). Match existing helper → call it. None exists but pattern is stack-generic → extract to util before second caller appears.

Layer placement test — ask "if a sibling caller needed this tomorrow, would they re-derive it?" If yes, the logic is in the wrong layer. Move it down.

Open-case-for-future-reuse — if reviewer spots a block that is likely to appear in another feature (domain-agnostic algorithm, shared lifecycle, recurring derivation), do NOT rationalize with pure YAGNI. Either extract now (if cheap) or create a tracked TODO with the exact extraction target so the second caller does not duplicate silently. Silent duplication is the default failure mode.

When in doubt ask: "What would need to change if the requirement shifts?"

The measure of good code is the cost of change. Not shortest. Not cleverest. Not most abstracted. Cheapest to safely modify having read a small local portion.

Shape of the rule	Goes to
Pure function over an entity's own data	static method on the entity
Behavior that mutates / guards entity state	instance method on the entity
Always-true invariant on a primitive value	value object constructor
Needs DI (repo / settings / clock)	helper class registered in DI
Domain-agnostic algorithm reused across types	util / extension method
Pure shape / projection conversion	DTO mapping

Change Scenario	Should touch
Add new variant (customer type, payment method)	1 new file
Change HTTP error response format	1 middleware/filter
Add timestamp field to every persisted entity	1 base entity/interceptor
Add authorization to a new endpoint	1 declarative marker
Swap database/ORM	Data layer only
Change business calculation rule	1 method on owning entity
Add loading indicator pattern to forms	1 base component/hook
Add validation rule to a domain primitive	1 value-object ctor
Change paging/retry/datetime algorithm	1 helper/util function
Change a derivation of entity data	1 method on the entity

Fix-Triggered Re-Review Loop — Re-review is triggered by a FIX CYCLE, not by a round number. Review purpose: review → if issues → fix → re-review until a round finds no issues. A clean review ENDS the loop — no further rounds required.

Round 1: Main-session review. Read target files, build understanding, note issues. Output findings + verdict (PASS / FAIL).

Decision after Round 1:

No issues found (PASS, zero findings) → review ENDS. Do NOT spawn a fresh sub-agent for confirmation.

Issues found (FAIL, or any non-zero findings) → fix the issues, then spawn a fresh sub-agent for Round 2 re-review.

Fresh sub-agent re-review (after every fix cycle): Spawn a NEW Agent tool call — never reuse a prior agent. Sub-agent re-reads ALL files from scratch with ZERO memory of prior rounds. See SYNC:fresh-context-review for the spawn mechanism and SYNC:review-protocol-injection for the canonical Agent prompt template. Each fresh round must catch:

Cross-cutting concerns missed in the prior round

Interaction bugs between changed files

Convention drift (new code vs existing patterns)

Missing pieces that should exist but don't

Subtle edge cases the prior round rationalized away

Regressions introduced by the fixes themselves

Loop termination: After each fresh round, repeat the same decision: clean → END; issues → fix → next fresh round. Continue until a round finds zero issues, or 3 fresh-subagent rounds max, then escalate to user via AskUserQuestion.

Rules:

A clean Round 1 ENDS the review — no mandatory Round 2

NEVER skip the fresh sub-agent re-review after a fix cycle (every fix invalidates the prior verdict)

NEVER reuse a sub-agent across rounds — every iteration spawns a NEW Agent call

Main agent READS sub-agent reports but MUST NOT filter, reinterpret, or override findings

Max 3 fresh-subagent rounds per review — if still FAIL, escalate via AskUserQuestion (do NOT silently loop)

Track round count in conversation context (session-scoped)

Final verdict must incorporate ALL rounds executed

Report must include ## Round N Findings (Fresh Sub-Agent) for every round N≥2 that was executed.

Fresh Sub-Agent Review — Eliminate orchestrator confirmation bias via isolated sub-agents.

Why: The main agent knows what it (or /cook) just fixed and rationalizes findings accordingly. A fresh sub-agent has ZERO memory, re-reads from scratch, and catches what the main agent dismissed. Sub-agent bias is mitigated by (1) fresh context, (2) verbatim protocol injection, (3) main agent not filtering the report.

When: ONLY after a fix cycle. A review round that finds zero issues ENDS the loop — do NOT spawn a confirmation sub-agent. A review round that finds issues triggers: fix → fresh sub-agent re-review.

How:

Spawn a NEW Agent tool call — use code-reviewer subagent_type for code reviews, general-purpose for plan/doc/artifact reviews

Inject ALL required review protocols VERBATIM into the prompt — see SYNC:review-protocol-injection for the full list and template. Never reference protocols by file path; AI compliance drops behind file-read indirection (see SYNC:shared-protocol-duplication-policy)

Sub-agent re-reads ALL target files from scratch via its own tool calls — never pass file contents inline in the prompt

Sub-agent writes structured report to plans/reports/{review-type}-round{N}-{date}.md

Main agent reads the report, integrates findings into its own report, DOES NOT override or filter

Rules:

SKIP fresh sub-agent when the prior round found zero issues (no fixes = nothing new to verify)

NEVER skip fresh sub-agent after a fix cycle — every fix invalidates the prior verdict

NEVER reuse a sub-agent across rounds — every fresh round spawns a NEW Agent call

Max 3 fresh-subagent rounds per review — escalate via AskUserQuestion if still failing; do NOT silently loop or fall back to any prior protocol

Track iteration count in conversation context (session-scoped, no persistent files)

Review Protocol Injection — Every fresh sub-agent review prompt MUST embed 10 protocol blocks VERBATIM. The template below has ALL 10 bodies already expanded inline. Copy the template wholesale into the Agent call's prompt field at runtime, replacing only the {placeholders} in Task / Round / Reference Docs / Target Files / Output sections with context-specific values. Do NOT touch the embedded protocol sections.

Why inline expansion: Placeholder markers would force file-read indirection at runtime. AI compliance drops significantly behind indirection (see SYNC:shared-protocol-duplication-policy). Therefore the template carries all 10 protocol bodies pre-embedded.

Subagent Type Selection

Choose sub-agent type based on the category of changes being reviewed:

Category	Sub-agent type	Rationale
Source code (logic, handlers, services)	`code-reviewer`	Purpose-built for code quality analysis
Security-sensitive files (auth, crypto, permissions)	`security-auditor`	Threat modeling, attack surface analysis
Performance-critical files (queries, caching, batch)	`performance-optimizer`	Bottleneck identification, baseline analysis
Plans, docs, specs, markdown	`general-purpose`	Plan/artifact review

For large changesets with mixed concerns, spawn multiple sub-agents (one per concern type) in parallel.

Canonical Agent Call Template (Copy Verbatim)

Agent({
description: "Fresh Round {N} review",
subagent_type: "code-reviewer",
prompt: `
## Task
{review-specific task — e.g., "Review assigned files for code quality" | "Review plan files under {plan-dir}" | "Review integration tests in {path}"}

## Round
Round {N}. You have ZERO memory of prior rounds. Re-read all target files from scratch via your own tool calls. Do NOT trust anything from the main agent beyond this prompt.

## Protocols (follow VERBATIM — these are non-negotiable)

### Evidence-Based Reasoning
Speculation is FORBIDDEN. Every claim needs proof.
1. Cite file:line, grep results, or framework docs for EVERY claim
2. Declare confidence: >80% act freely, 60-80% verify first, <60% DO NOT recommend
3. Cross-service validation required for architectural changes
4. "I don't have enough evidence" is valid and expected output
BLOCKED until: Evidence file path (file:line) provided; Grep search performed; 3+ similar patterns found; Confidence level stated.
Forbidden without proof: "obviously", "I think", "should be", "probably", "this is because".
If incomplete → output: "Insufficient evidence. Verified: [...]. Not verified: [...]."

### Bug Detection
MUST check categories 1-4 for EVERY review. Never skip.
1. Null Safety: Can params/returns be null? Are they guarded? Optional chaining gaps? .find() returns checked?
2. Boundary Conditions: Off-by-one (< vs <=)? Empty collections handled? Zero/negative values? Max limits?
3. Error Handling: Try-catch scope correct? Silent swallowed exceptions? Error types specific? Cleanup in finally?
4. Resource Management: Connections/streams closed? Subscriptions unsubscribed on destroy? Timers cleared? Memory bounded?
5. Concurrency (if async): Missing await? Race conditions on shared state? Stale closures? Retry storms?
6. Language-Idiomatic Traps: Apply your knowledge of idiomatic pitfalls for the languages/runtimes present in the changed files. Do NOT enumerate a fixed list — derive from the actual tech stack.
Classify: CRITICAL (crash/corrupt) → FAIL | HIGH (incorrect behavior) → FAIL | MEDIUM (edge case) → WARN | LOW (defensive) → INFO.

### Design Patterns Quality
Priority checks for every code change:
1. DRY via OOP: Identify same-purpose classes (same naming pattern, same lifecycle, same data shape). 3+ similar patterns → extract to shared abstraction. Apply your knowledge of the project's language/framework to determine the idiomatic base class / mixin / trait / protocol pattern.
2. Right Responsibility: Logic in LOWEST layer (Entity > Domain Service > Application Service > Controller). Never business logic in controllers.
3. SOLID: Single responsibility (one reason to change). Open-closed (extend, don't modify). Liskov (subtypes substitutable). Interface segregation (small interfaces). Dependency inversion (depend on abstractions).
4. After extraction/move/rename: Grep ENTIRE scope for dangling references. Zero tolerance.
5. YAGNI gate: NEVER recommend patterns unless 3+ occurrences exist. Don't extract for hypothetical future use.
Anti-patterns to flag: God Object, Copy-Paste inheritance, Circular Dependency, Leaky Abstraction.

### Logic & Intention Review
Verify WHAT code does matches WHY it was changed.
1. Change Intention Check: Every changed file MUST serve the stated purpose. Flag unrelated changes as scope creep.
2. Happy Path Trace: Walk through one complete success scenario through changed code.
3. Error Path Trace: Walk through one failure/edge case scenario through changed code.
4. Acceptance Mapping: If plan context available, map every acceptance criterion to a code change.
NEVER mark review PASS without completing both traces (happy + error path).

### Test Coverage Verification
Map changed code to test coverage.
1. Identify the project's test spec format and location — search for test files, spec docs, or test catalogs near the changed files.
2. Every changed code path MUST map to a corresponding test (or flag as "needs test").
3. New functions/endpoints/handlers → flag for test creation.
4. Verify test references point to actual code (file:line, not stale).
5. If no tests exist → log gap and recommend creating tests.
NEVER skip test mapping. Untested code paths are the #1 source of production bugs.

### Fix-Layer Accountability
NEVER fix at the crash site. Trace the full flow, fix at the owning layer. The crash site is a SYMPTOM, not the cause.
MANDATORY before ANY fix:
1. Trace full data flow — Map the complete path from data origin to crash site across ALL layers (storage → backend → API → frontend → UI). Identify where bad state ENTERS, not where it CRASHES.
2. Identify the invariant owner — Which layer's contract guarantees this value is valid? Fix at the LOWEST layer that owns the invariant, not the highest layer that consumes it.
3. One fix, maximum protection — If fix requires touching 3+ files with defensive checks, you are at the wrong layer — go lower.
4. Verify no bypass paths — Confirm all data flows through the fix point. Check for direct construction skipping factories, clone/spread without re-validation, raw data not wrapped in domain models, mutations outside the model layer.
BLOCKED until: Full data flow traced (origin → crash); Invariant owner identified with file:line evidence; All access sites audited (grep count); Fix layer justified (lowest layer that protects most consumers).
Anti-patterns (REJECT): "Fix it where it crashes" (crash site ≠ cause site, trace upstream); "Add defensive checks at every consumer" (scattered defense = wrong layer); "Both fix is safer" (pick ONE authoritative layer).

### Rationalization Prevention
AI skips steps via these evasions. Recognize and reject:
- "Too simple for a plan" → Simple + wrong assumptions = wasted time. Plan anyway.
- "I'll test after" → RED before GREEN. Write/verify test first.
- "Already searched" → Show grep evidence with file:line. No proof = no search.
- "Just do it" → Still need TaskCreate. Skip depth, never skip tracking.
- "Just a small fix" → Small fix in wrong location cascades. Verify file:line first.
- "Code is self-explanatory" → Future readers need evidence trail. Document anyway.
- "Combine steps to save time" → Combined steps dilute focus. Each step has distinct purpose.

### Graph-Assisted Investigation
MANDATORY when .code-graph/graph.db exists.
HARD-GATE: MUST run at least ONE graph command on key files before concluding any investigation.
Pattern: Grep finds files → trace --direction both reveals full system flow → Grep verifies details.
- Investigation/Scout: trace --direction both on 2-3 entry files
- Fix/Debug: callers_of on buggy function + tests_for
- Feature/Enhancement: connections on files to be modified
- Code Review: tests_for on changed functions
- Blast Radius: trace --direction downstream
CLI: python .claude/scripts/code_graph {command} --json. Use --node-mode file first (10-30x less noise), then --node-mode function for detail.

### Understand Code First
HARD-GATE: Do NOT write, plan, or fix until you READ existing code.
1. Search 3+ similar patterns (grep/glob) — cite file:line evidence.
2. Read existing files in target area — understand structure, base classes, conventions.
3. Run python .claude/scripts/code_graph trace <file> --direction both --json when .code-graph/graph.db exists.
4. Map dependencies via connections or callers_of — know what depends on your target.
5. Write investigation to .ai/workspace/analysis/ for non-trivial tasks (3+ files).
6. Re-read analysis file before implementing — never work from memory alone.
7. NEVER invent new patterns when existing ones work — match exactly or document deviation.
BLOCKED until: Read target files; Grep 3+ patterns; Graph trace (if graph.db exists); Assumptions verified with evidence.

## Reference Docs (READ before reviewing)
Search the repository for:
- Project coding standards or review rules docs (search: "code-review-rules", "coding-standards", "style-guide", "contributing")
- Architecture documentation relevant to the changed files (search: "patterns-reference", "architecture", "adr")
- If none found, rely on your knowledge of the project's tech stack inferred from file extensions and directory structure.

## Target Files
{explicit file list OR selected review scope OR "read all files under {plan-dir}"}

## Output
Write a structured report to plans/reports/{review-type}-round{N}-{date}.md with sections:
- Status: PASS | FAIL
- Issue Count: {number}
- Critical Issues (with file:line evidence)
- High Priority Issues (with file:line evidence)
- Medium / Low Issues
- Cross-cutting findings

Return the report path and status to the main agent.
Every finding MUST have file:line evidence. Speculation is forbidden.
`
})

Rules

DO copy the template wholesale — including all 10 embedded protocol sections
DO replace only the {placeholders} in Task / Round / Reference Docs / Target Files / Output sections with context-specific content
DO choose code-reviewer subagent_type for code reviews and general-purpose for plan / doc / artifact reviews
DO NOT paraphrase, summarize, or skip any protocol section
DO NOT pass file contents inline — the sub-agent reads via its own tool calls so it has a fresh context
DO NOT reference protocols by file path or tag name — the bodies are already embedded above
DO NOT introduce placeholder markers for the protocols — they must stay literally expanded

Rationalization Prevention — AI skips steps via these evasions. Recognize and reject:

Evasion Rebuttal
"Too simple for a plan" Simple + wrong assumptions = wasted time. Plan anyway.
"I'll test after" RED before GREEN. Write/verify test first.
"Already searched" Show grep evidence with file:line. No proof = no search.
"Just do it" Still need TaskCreate. Skip depth, never skip tracking.
"Just a small fix" Small fix in wrong location cascades. Verify file:line first.
"Code is self-explanatory" Future readers need evidence trail. Document anyway.
"Combine steps to save time" Combined steps dilute focus. Each step has distinct purpose.

Evasion	Rebuttal
"Too simple for a plan"	Simple + wrong assumptions = wasted time. Plan anyway.
"I'll test after"	RED before GREEN. Write/verify test first.
"Already searched"	Show grep evidence with `file:line`. No proof = no search.
"Just do it"	Still need TaskCreate. Skip depth, never skip tracking.
"Just a small fix"	Small fix in wrong location cascades. Verify file:line first.
"Code is self-explanatory"	Future readers need evidence trail. Document anyway.
"Combine steps to save time"	Combined steps dilute focus. Each step has distinct purpose.

Logic & Intention Review — Verify WHAT code does matches WHY it was changed.

Change Intention Check: Every changed file MUST ATTENTION serve the stated purpose. Flag unrelated changes as scope creep.

Happy Path Trace: Walk through one complete success scenario through changed code

Error Path Trace: Walk through one failure/edge case scenario through changed code

Acceptance Mapping: If plan context available, map every acceptance criterion to a code change

NEVER mark review PASS without completing both traces (happy + error path).

Bug Detection — MUST ATTENTION check categories 1-4 for EVERY review. Never skip.

Null Safety: Can params/returns be null? Are they guarded? Optional chaining gaps? .find() returns checked?

Boundary Conditions: Off-by-one (< vs <=)? Empty collections handled? Zero/negative values? Max limits?

Error Handling: Try-catch scope correct? Silent swallowed exceptions? Error types specific? Cleanup in finally?

Resource Management: Connections/streams closed? Subscriptions unsubscribed on destroy? Timers cleared? Memory bounded?

Concurrency (if async): Missing await? Race conditions on shared state? Stale closures? Retry storms?

Language-Idiomatic Traps: Apply your knowledge of idiomatic pitfalls for the languages/runtimes present in the changed files. Do NOT enumerate a fixed list — derive from the actual tech stack.

Classify: CRITICAL (crash/corrupt) → FAIL | HIGH (incorrect behavior) → FAIL | MEDIUM (edge case) → WARN | LOW (defensive) → INFO

Test Coverage Verification — Map changed code to test coverage.

Find the project's test format — search for test files, spec docs, or test catalogs near the changed files. Note the naming convention and location pattern.

Map changed behavior to tests — every changed code path MUST ATTENTION map to a test (or flag as "needs test").

New functions/endpoints/handlers → flag for test creation.

Verify test references point to actual code (file:line, not stale).

Coverage by concern type: Security-sensitive changes → auth/permission tests exist? Data-mutating changes → state assertion tests exist?

If no tests exist → log gap and recommend creating tests.

NEVER skip test mapping. Untested code paths are the #1 source of production bugs.

Fix-Layer Accountability — NEVER fix at the crash site. Trace the full flow, fix at the owning layer.

AI default behavior: see error at Place A → fix Place A. This is WRONG. The crash site is a SYMPTOM, not the cause.

MANDATORY before ANY fix:

Trace full data flow — Map the complete path from data origin to crash site across ALL layers (storage → backend → API → frontend → UI). Identify where the bad state ENTERS, not where it CRASHES.

Identify the invariant owner — Which layer's contract guarantees this value is valid? That layer is responsible. Fix at the LOWEST layer that owns the invariant — not the highest layer that consumes it.

One fix, maximum protection — Ask: "If I fix here, does it protect ALL downstream consumers with ONE change?" If fix requires touching 3+ files with defensive checks, you are at the wrong layer — go lower.

Verify no bypass paths — Confirm all data flows through the fix point. Check for: direct construction skipping factories, clone/spread without re-validation, raw data not wrapped in domain models, mutations outside the model layer.

BLOCKED until: - [ ] Full data flow traced (origin → crash) - [ ] Invariant owner identified with file:line evidence - [ ] All access sites audited (grep count) - [ ] Fix layer justified (lowest layer that protects most consumers)

Anti-patterns (REJECT these):

"Fix it where it crashes" — Crash site ≠ cause site. Trace upstream.

"Add defensive checks at every consumer" — Scattered defense = wrong layer. One authoritative fix > many scattered guards.

"Both fix is safer" — Pick ONE authoritative layer. Redundant checks across layers send mixed signals about who owns the invariant.

AI Mistake Prevention — Failure modes to avoid on every task:

Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal. Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing. Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain. Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path. When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site. Assume existing values are intentional — ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code. Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks. Holistic-first debugging — resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis. Surgical changes — apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly. Surface ambiguity before coding — don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path. Business terminology in Application/Domain layers. Comments and naming in Application/Domain must stay business-oriented and technical-agnostic; avoid implementation terms (say background job, not Hangfire background job).

MANDATORY MUST ATTENTION cite file:line evidence for every claim. Confidence >80% to act, <60% = do NOT recommend.

MANDATORY MUST ATTENTION check DRY via OOP (same-suffix → base class), right responsibility (lowest layer), SOLID. Grep for dangling refs after changes.

MANDATORY MUST ATTENTION apply complexity prevention — one business change = one code change. Flag change amplification (>3 edit sites for future change), scattered type-switches, anemic models, primitive obsession, leaked technology through abstractions, shallow modules, un-extracted utility logic (paging/datetime/string/retry → helpers), and logic in the wrong higher layer (downshift to callee/entity/VM). Don't rationalize silent duplication with pure YAGNI.

MANDATORY MUST ATTENTION execute the review loop: review → if issues → fix → fresh sub-agent re-review. A round that finds zero issues ENDS the review.

MANDATORY MUST ATTENTION follow ALL steps regardless of perceived simplicity. "Too simple to plan" is evasion, not reason.

MANDATORY MUST ATTENTION run at least ONE graph command on key files when graph.db exists. Pattern: grep → graph trace → grep verify.

MANDATORY MUST ATTENTION verify every changed file serves stated purpose. Trace happy + error paths. Flag scope creep.

MANDATORY MUST ATTENTION check null safety, boundary conditions, error handling, resource management for every review.

MANDATORY MUST ATTENTION map every changed function/endpoint to a test. Search for project's test spec format near changed files. Flag coverage gaps, recommend test creation.

MANDATORY MUST ATTENTION for multilingual frontend/UI text changes, verify translation updates are present (or explicitly accepted by user as risk) before PASS.

IMPORTANT MUST ATTENTION trace full data flow and fix at owning layer, not crash site. Audit all access sites before adding ?..

MUST ATTENTION apply critical thinking — every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact.

MUST ATTENTION apply sequential-thinking — multi-step Thought N/M, REVISION/BRANCH/HYPOTHESIS markers, confidence % closer; see /sequential-thinking skill.

MUST ATTENTION apply AI mistake prevention — holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction.

MUST ATTENTION Phase 0.7: derive review categories from file language + directory semantics + change nature. Create sub-tasks per category. Derive concerns from first principles — do not use a fixed checklist.

MANDATORY Bootstrap task tracking before target work; transition one task at a time.
MANDATORY Persist plan/review findings to plans/reports/ incrementally and synthesize from disk.

MANDATORY After task-tracking bootstrap and before target/source work, read required project-reference docs and cite Reference docs read: ....
MANDATORY Always include lessons.md; project conventions override generic defaults.

MANDATORY Parent workflow rows do not replace child phase tracking; expand phases and link the parent when nested.
MANDATORY Orchestrators pre-expand child skill phases before invocation; use [N.M] $skill-name — phase prefixes and one-in_progress discipline.

Prompt-Enhance Closing Anchors

IMPORTANT MUST ATTENTION follow declared step order for this skill; NEVER skip, reorder, or merge steps without explicit user approval IMPORTANT MUST ATTENTION for every step/sub-skill call: set in_progress before execution, set completed after execution IMPORTANT MUST ATTENTION every skipped step MUST include explicit reason; every completed step MUST include concise evidence IMPORTANT MUST ATTENTION if Task tools unavailable, maintain an equivalent step-by-step plan tracker with synchronized statuses

Closing Reminders

MANDATORY MUST ATTENTION Nested Task Expansion Contract — when invoked inside a workflow, STILL expand internal phases via TaskCreate with [N.M] $skill-name — phase prefix and TaskUpdate(parentTaskId, addBlockedBy: [childIds]) linkage. Workflow row is container, not substitute.
MANDATORY MUST ATTENTION break work into small todo tasks using TaskCreate BEFORE starting
MANDATORY MUST ATTENTION validate decisions with user via AskUserQuestion — never auto-decide
MANDATORY MUST ATTENTION add final review task to verify work quality
MANDATORY MUST ATTENTION search for project-specific reference docs BEFORE reviewing (coding standards, architecture, test conventions)
MANDATORY MUST ATTENTION Phase 0: detect change type FIRST — route auth/perf files to specialized sub-agents before general review
MANDATORY MUST ATTENTION run /why-review after completing this review to validate design rationale, alternatives considered, and risk assessment

[TASK-PLANNING] Before acting, analyze task scope and systematically break it into small todo tasks and sub-tasks using TaskCreate.

code-review

SKILL.md