Agent Skill
2/7/2026

sb-review-tests

Review tests for completeness, usefulness, isolation, and readability. Use when asked to review tests, check test quality, audit test coverage, or validate test files.

L
lumberbarons
0GitHub Stars
1Views
npx skills add lumberbarons/lumber-mart

SKILL.md

Namesb-review-tests
DescriptionReview tests for completeness, usefulness, isolation, and readability. Use when asked to review tests, check test quality, audit test coverage, or validate test files.

name: review-tests description: Review tests for completeness, usefulness, isolation, and readability. Use when asked to review tests, check test quality, audit test coverage, or validate test files.

Tests Review

Review tests in the specified path for quality issues.

[!IMPORTANT] Consult REFERENCE.md for the expected output format and level of detail.

User Input

$ARGUMENTS

You MUST consider the user input before proceeding (if not empty).

Argument Parsing

Parse $ARGUMENTS for:

  • Path: Required path to review (file or directory containing tests)
  • --create-beads: Create beads for findings (default: report only)

Workflow

Step 1 — Discover test files

Find all test files in the specified path using language-appropriate patterns: *_test.go, test_*.py, *_test.py, *.test.ts, *.test.js, *.spec.ts, *.spec.js, __tests__/**, etc. Record the full file list and count.

Step 2 — Choose execution strategy

  • 1–2 files → Direct mode: Read the files, evaluate against the Quality Criteria below, then proceed to Deduplication and Bead Check.
  • 3+ files → Parallel mode: Batch files, spawn subagents, collect results, merge, then proceed to Deduplication and Bead Check.

Parallel Review Mode

Use this mode when 3 or more test files are discovered.

Batching

Group files into batches based on total file count:

Total filesFiles per batch~Subagents
3–1013–10
11–2026–10
21+37–10

Spawn subagents

For each batch, use Agent(subagent_type="general-purpose"). Spawn all subagents in a single message so they run in parallel.

Each subagent prompt MUST include:

  1. The file paths in its batch (instruct the subagent to read them)
  2. The Quality Criteria section from this skill — copy it verbatim into the prompt
  3. The Severity section from this skill — copy it verbatim into the prompt
  4. The structured output format below
  5. The explicit instruction: "Do NOT use the Bash tool. Do NOT run any shell commands. Use only Read, Grep, and Glob tools. Do NOT create beads, do NOT run bd commands. Return findings only."
  6. The explicit instruction: "For every P3 finding, you MUST state a concrete falsifiability claim in the explanation field: 'if <specific code change> were made, this test would incorrectly pass.' Omit P3 findings that lack this claim."

Instruct each subagent to return findings in this exact delimited format (one block per finding):

---FINDING---
priority: P<1|2|3>
location: <file:line>
title: <short title>
category: <Completeness|Usefulness|Output Validation|Isolation|Readability|Integration Test Specifics>
explanation: <what is wrong or missing and why it matters>
fix: <concrete prescription>
done_when: <verifiable criterion>
---END---

If the subagent finds no issues for its batch, it should return ---NO-FINDINGS---.

Collect and merge

After all subagents return:

  1. Parse each subagent's structured findings
  2. Combine into a single list, sorted by priority (P1 first)
  3. Deduplicate: if two findings share the same location (file:line) AND the same category, keep only the one with the highest priority

Error fallback

If a subagent fails or returns unparseable output, review those files directly (as in direct mode) and include a note in the report: Note: Files [list] were reviewed directly due to subagent failure.

Deduplication and Bead Check

Both direct mode and parallel mode flow into this step.

Before creating a bead, check if one already exists:

  • Run bd list --status=open to see all open issues
  • Run bd list --status=closed to see previously resolved issues
  • Compare your findings against existing bead titles and descriptions
  • Skip creating beads for issues that already have matching open beads
  • Skip findings where a closed bead's fix introduced the pattern you're flagging. Use bd show <id> on relevant closed beads. If the current code is the intentional resolution of a prior bug, do not re-raise it.
  • When in doubt, show the user the potential duplicate rather than creating

Quality Criteria

Completeness

  • Edge cases covered
  • Error paths tested
  • Boundary conditions checked
  • Happy path and failure scenarios both present

Usefulness

  • Tests catch real bugs, not just chase coverage
  • Tests would fail if code broke
  • Tests validate behavior, not implementation details

Output Validation

  • Assertions check actual results
  • Not just "no error thrown"
  • Expected values are meaningful, not arbitrary

Isolation

  • No shared mutable state between tests
  • Tests can run in any order
  • Tests can run in parallel
  • External dependencies mocked/stubbed appropriately

Readability

  • Clear, descriptive test names
  • Obvious arrange-act-assert structure
  • Test intent immediately clear
  • Setup/teardown not hiding important context

Integration Test Specifics

  • Proper cleanup of test data
  • Appropriate use of test fixtures
  • Reasonable timeouts configured
  • Clear distinction from unit tests

Severity

  • P1: Shared mutable state, tests that never fail, tests masking real bugs
  • P2: Missing assertions on return values, unclear test names, no isolation
  • P3: Missing edge cases or assertions where you can state a concrete falsifiability claim — i.e., "if <specific code change> were made, this test would incorrectly pass." Do not raise P3 for suggestions that only make a test more thorough without identifying a realistic false-pass scenario.

Output

You MUST produce a report following the exact structure shown in REFERENCE.md. When using parallel mode, the lead assembles the unified report from subagent findings. The report format is identical regardless of execution mode.

--create-beads mode: Use bug for test quality failures; use task for missing coverage that needs to be added.

Each bead description MUST be structured in three parts:

<file:line>

<Explanation of the problem: what is wrong or missing and why it matters.>

Fix: <Concrete prescription. For quality bugs, specify exactly what must change
(e.g., "replace the shared `db` package-level var with a per-test instance
constructed in each test's setup"). For coverage tasks, specify exactly what
scenario or assertion is missing (e.g., "add a test case where the input
slice is nil; the current table has only empty-slice and non-empty cases").>

Done when: <A verifiable criterion. Must be checkable by reading the test file.
Example: "TestFoo has no package-level mutable state; all state is initialized
inside t.Run or TestFoo itself."
NOT: "The test is properly isolated.">
  • Quality bugs (shared state, assertions that never fail, tests that mask bugs):

    bd create --type=bug --priority=<1-2> --title="Test: <issue>" \
      --description="<file:line>
    
    <explanation>
    
    Fix: <concrete prescription>
    
    Done when: <verifiable criterion>"
    
  • Missing coverage (edge cases, error paths, boundary conditions):

    bd create --type=task --priority=<2-3> --title="Test: <what to add>" \
      --description="<file:line>
    
    <explanation>
    
    Fix: <concrete prescription>
    
    Done when: <verifiable criterion>"
    
Skills Info
Original Name:sb-review-testsAuthor:lumberbarons