strong-inference
This skill should be used when the user wants to "investigate a bug", "debug an issue", "figure out why something is happening", "find the root cause", "troubleshoot", "強い推論で調査", "仮説を立てて検証", "原因を特定", "バグの原因調査", "なぜ動かないか調べて", "問題を切り分け", "原因不明", "デバッグ", or mentions systematic hypothesis-driven debugging. NOTE: Use this for investigating UNKNOWN causes, not for validating design proposals (use devils-advocate for that).
SKILL.md
| Name | strong-inference |
| Description | This skill should be used when the user wants to "investigate a bug", "debug an issue", "figure out why something is happening", "find the root cause", "troubleshoot", "強い推論で調査", "仮説を立てて検証", "原因を特定", "バグの原因調査", "なぜ動かないか調べて", "問題を切り分け", "原因不明", "デバッグ", or mentions systematic hypothesis-driven debugging. NOTE: Use this for investigating UNKNOWN causes, not for validating design proposals (use devils-advocate for that). |
name: Strong Inference description: This skill should be used when the user wants to "investigate a bug", "debug an issue", "figure out why something is happening", "find the root cause", "troubleshoot", "強い推論で調査", "仮説を立てて検証", "原因を特定", "バグの原因調査", "なぜ動かないか調べて", "問題を切り分け", "原因不明", "デバッグ", or mentions systematic hypothesis-driven debugging. NOTE: Use this for investigating UNKNOWN causes, not for validating design proposals (use devils-advocate for that).
Strong Inference Skill
Apply the "Strong Inference" methodology to investigate problems systematically through competing hypotheses and decisive experiments.
Overview
Strong Inference is a scientific method that accelerates problem-solving by:
- Generating multiple competing hypotheses
- Designing experiments that eliminate hypotheses
- Iterating until the most likely explanation remains
This skill helps developers investigate bugs, performance issues, and unexpected behaviors using a structured, hypothesis-driven approach.
Key Feature: In codex mode, this skill can optionally leverage Codex for hypothesis generation and review while Claude handles verification execution.
Prerequisites
- The user has a problem, bug, or unexpected behavior to investigate
- Relevant code context is available
- For Codex collaboration mode: Codex CLI available (
codex exec)
Workflow Phases
Phase 1: Problem Definition
When the user presents a problem:
-
Collect information:
- Error messages, logs, stack traces
- Steps to reproduce
- Expected vs actual behavior
- Recent changes that might be related
-
Clarify scope:
- Which components are involved?
- When did it start happening?
- Is it reproducible consistently?
Phase 2: Hypothesis Generation
Generate 2-4 competing hypotheses that are:
- Mutually exclusive: If H1 is true, H2 cannot be true
- Testable: Can be verified or eliminated with evidence
- Specific: Clear enough to design a decisive test
Example hypotheses for "API returns 500 intermittently":
H1: Database connection pool exhausted under load
H2: Race condition in cache update causing stale data
H3: External service timeout not handled properly
H4: Memory leak causing OOM conditions
Phase 3: Verification Design
For each hypothesis, design a "killer experiment" that:
- Can eliminate the hypothesis if the result is negative
- Requires minimal effort for maximum information gain
- Is safe to execute (no production impact)
Prioritize experiments by:
- Ease of execution (quick wins first)
- Discriminating power (eliminates multiple hypotheses)
- Risk level (non-destructive first)
Phase 4: Verification Execution
Execute verifications in priority order:
- Code inspection (reading files, checking logic)
- Log analysis (searching for patterns)
- Test execution (running specific tests)
- Instrumentation (adding debug output)
Safety guards:
- Confirm before any file modifications
- Set timeouts for long-running operations
- Log all executed commands
Phase 5: Analysis and Iteration
After each verification:
- Record evidence: What was observed?
- Update hypothesis status:
[X]Eliminated (evidence contradicts)[?]Pending (not yet tested)[!]Supported (evidence aligns)
- Refine remaining hypotheses based on new information
- Generate new hypotheses if all were eliminated
Phase 6: Conclusion
When one hypothesis has strong supporting evidence:
- Summarize findings: Evidence trail and reasoning
- Propose solution: Based on confirmed hypothesis
- Suggest prevention: How to avoid similar issues
Role Distribution
| Mode | Hypothesis Gen | Verification Design | Execution | Review |
|---|---|---|---|---|
codex | Codex | Claude | Claude | Codex |
claude-only | Claude | Claude | Claude | Claude |
- Default mode:
codex(when Codex CLI available) - Fallback:
claude-only(automatic when Codex unavailable)
Hypothesis Tree File
Investigation state is persisted to tmp/strong-inference/<task-id>.md:
---
schema: strong-inference/v1
task_id: abc123
created: 2026-02-02T12:00:00Z
problem: "API returns 500 intermittently"
mode: codex
---
# Investigation: API returns 500 intermittently
## Hypotheses
### H1: Database connection pool exhausted
- Status: [X] Eliminated
- Evidence: Connection count stable at 5/20 during error window
- Verified: 2026-02-02T12:15:00Z
### H2: Race condition in cache update
- Status: [?] Pending
- Test: Add mutex logging to CacheManager.update()
- Priority: High (matches timing pattern)
### H3: External service timeout
- Status: [!] Supported
- Evidence: Errors correlate with ExternalAPI latency spikes
- Next: Verify timeout handling in ApiClient.fetch()
## Verification Log
| Time | Action | Result |
|------|--------|--------|
| 12:05 | Read db/pool.go | Found pool size config |
| 12:10 | Check connection metrics | Stable at 5/20 |
| 12:15 | Eliminated H1 | Evidence contradicts |
Safety Guards
Before executing verification commands:
- Confirm destructive operations: File changes, test execution
- Set timeout: Default 60 seconds per operation
- Log all commands: Record in verification log
Stop conditions:
- All hypotheses eliminated (request new hypotheses)
max_iterationsreached (default: 10)- User requests stop
Output Format
Progress Display
Strong Inference Investigation
==============================
Problem: API returns 500 error intermittently
Hypotheses:
[X] H1: Database connection pool exhausted
Evidence: Connection count normal (eliminated)
[!] H2: Race condition in cache update
Evidence: Timing matches error pattern (supported)
[?] H3: External service timeout
Evidence: Pending verification
Current: Designing test for H2
Completion Report
Investigation Complete
======================
Problem: API returns 500 error intermittently
Root Cause: Race condition in CacheManager.update()
Confidence: High (3 supporting evidence points)
Evidence Trail:
1. Errors occur only during cache refresh window
2. Adding mutex eliminated the error
3. Race condition visible in thread dump
Recommended Fix:
- Add mutex lock in CacheManager.update() line 45
- Consider using sync.RWMutex for better concurrency
Prevention:
- Add race detector to CI pipeline
- Review other cache operations for similar patterns
Invoking the Skill
Use the /strong-inference command:
# Basic usage - investigate a problem
/strong-inference API sometimes returns 500 errors
# With mode selection
/strong-inference --mode claude-only Why is the test flaky?
# Japanese
/strong-inference このバグの原因を調査して
References
Detailed templates in references/:
hypothesis-template.md- Template for Codex hypothesis generationverification-patterns.md- Common verification strategies