sb-review-tests
Review tests for completeness, usefulness, isolation, and readability. Use when asked to review tests, check test quality, audit test coverage, or validate test files.
SKILL.md
| Name | sb-review-tests |
| Description | Review tests for completeness, usefulness, isolation, and readability. Use when asked to review tests, check test quality, audit test coverage, or validate test files. |
name: review-tests description: Review tests for completeness, usefulness, isolation, and readability. Use when asked to review tests, check test quality, audit test coverage, or validate test files.
Tests Review
Review tests in the specified path for quality issues.
[!IMPORTANT] Consult REFERENCE.md for the expected output format and level of detail.
User Input
$ARGUMENTS
You MUST consider the user input before proceeding (if not empty).
Argument Parsing
Parse $ARGUMENTS for:
- Path: Required path to review (file or directory containing tests)
--create-beads: Create beads for findings (default: report only)
Workflow
Step 1 — Discover test files
Find all test files in the specified path using language-appropriate patterns: *_test.go, test_*.py, *_test.py, *.test.ts, *.test.js, *.spec.ts, *.spec.js, __tests__/**, etc. Record the full file list and count.
Step 2 — Choose execution strategy
- 1–2 files → Direct mode: Read the files, evaluate against the Quality Criteria below, then proceed to Deduplication and Bead Check.
- 3+ files → Parallel mode: Batch files, spawn subagents, collect results, merge, then proceed to Deduplication and Bead Check.
Parallel Review Mode
Use this mode when 3 or more test files are discovered.
Batching
Group files into batches based on total file count:
| Total files | Files per batch | ~Subagents |
|---|---|---|
| 3–10 | 1 | 3–10 |
| 11–20 | 2 | 6–10 |
| 21+ | 3 | 7–10 |
Spawn subagents
For each batch, use Agent(subagent_type="general-purpose"). Spawn all subagents in a single message so they run in parallel.
Each subagent prompt MUST include:
- The file paths in its batch (instruct the subagent to read them)
- The Quality Criteria section from this skill — copy it verbatim into the prompt
- The Severity section from this skill — copy it verbatim into the prompt
- The structured output format below
- The explicit instruction: "Do NOT use the Bash tool. Do NOT run any shell commands. Use only Read, Grep, and Glob tools. Do NOT create beads, do NOT run
bdcommands. Return findings only." - The explicit instruction: "For every P3 finding, you MUST state a concrete falsifiability claim in the
explanationfield: 'if <specific code change> were made, this test would incorrectly pass.' Omit P3 findings that lack this claim."
Instruct each subagent to return findings in this exact delimited format (one block per finding):
---FINDING---
priority: P<1|2|3>
location: <file:line>
title: <short title>
category: <Completeness|Usefulness|Output Validation|Isolation|Readability|Integration Test Specifics>
explanation: <what is wrong or missing and why it matters>
fix: <concrete prescription>
done_when: <verifiable criterion>
---END---
If the subagent finds no issues for its batch, it should return ---NO-FINDINGS---.
Collect and merge
After all subagents return:
- Parse each subagent's structured findings
- Combine into a single list, sorted by priority (P1 first)
- Deduplicate: if two findings share the same
location(file:line) AND the samecategory, keep only the one with the highest priority
Error fallback
If a subagent fails or returns unparseable output, review those files directly (as in direct mode) and include a note in the report: Note: Files [list] were reviewed directly due to subagent failure.
Deduplication and Bead Check
Both direct mode and parallel mode flow into this step.
Before creating a bead, check if one already exists:
- Run
bd list --status=opento see all open issues - Run
bd list --status=closedto see previously resolved issues - Compare your findings against existing bead titles and descriptions
- Skip creating beads for issues that already have matching open beads
- Skip findings where a closed bead's fix introduced the pattern you're flagging. Use
bd show <id>on relevant closed beads. If the current code is the intentional resolution of a prior bug, do not re-raise it. - When in doubt, show the user the potential duplicate rather than creating
Quality Criteria
Completeness
- Edge cases covered
- Error paths tested
- Boundary conditions checked
- Happy path and failure scenarios both present
Usefulness
- Tests catch real bugs, not just chase coverage
- Tests would fail if code broke
- Tests validate behavior, not implementation details
Output Validation
- Assertions check actual results
- Not just "no error thrown"
- Expected values are meaningful, not arbitrary
Isolation
- No shared mutable state between tests
- Tests can run in any order
- Tests can run in parallel
- External dependencies mocked/stubbed appropriately
Readability
- Clear, descriptive test names
- Obvious arrange-act-assert structure
- Test intent immediately clear
- Setup/teardown not hiding important context
Integration Test Specifics
- Proper cleanup of test data
- Appropriate use of test fixtures
- Reasonable timeouts configured
- Clear distinction from unit tests
Severity
- P1: Shared mutable state, tests that never fail, tests masking real bugs
- P2: Missing assertions on return values, unclear test names, no isolation
- P3: Missing edge cases or assertions where you can state a concrete falsifiability claim — i.e., "if <specific code change> were made, this test would incorrectly pass." Do not raise P3 for suggestions that only make a test more thorough without identifying a realistic false-pass scenario.
Output
You MUST produce a report following the exact structure shown in REFERENCE.md. When using parallel mode, the lead assembles the unified report from subagent findings. The report format is identical regardless of execution mode.
--create-beads mode: Use bug for test quality failures; use task for missing coverage that needs to be added.
Each bead description MUST be structured in three parts:
<file:line>
<Explanation of the problem: what is wrong or missing and why it matters.>
Fix: <Concrete prescription. For quality bugs, specify exactly what must change
(e.g., "replace the shared `db` package-level var with a per-test instance
constructed in each test's setup"). For coverage tasks, specify exactly what
scenario or assertion is missing (e.g., "add a test case where the input
slice is nil; the current table has only empty-slice and non-empty cases").>
Done when: <A verifiable criterion. Must be checkable by reading the test file.
Example: "TestFoo has no package-level mutable state; all state is initialized
inside t.Run or TestFoo itself."
NOT: "The test is properly isolated.">
-
Quality bugs (shared state, assertions that never fail, tests that mask bugs):
bd create --type=bug --priority=<1-2> --title="Test: <issue>" \ --description="<file:line> <explanation> Fix: <concrete prescription> Done when: <verifiable criterion>" -
Missing coverage (edge cases, error paths, boundary conditions):
bd create --type=task --priority=<2-3> --title="Test: <what to add>" \ --description="<file:line> <explanation> Fix: <concrete prescription> Done when: <verifiable criterion>"