Agent Skill
2/7/2026

error-analysis

This skill should be used when analyzing errors, stack traces, and logs to identify root causes and implement fixes.

J
jayteealao
1GitHub Stars
1Views
npx skills add jayteealao/Tassel

SKILL.md

Nameerror-analysis
DescriptionThis skill should be used when analyzing errors, stack traces, and logs to identify root causes and implement fixes.

name: error-analysis description: This skill should be used when analyzing errors, stack traces, and logs to identify root causes and implement fixes.

Error Analysis Skill

Systematically analyze errors and logs to find root causes.

When to Use

  • Debugging production errors
  • Analyzing stack traces
  • Investigating log patterns
  • Performing root cause analysis
  • Categorizing error types

Reference Documents

Analysis Workflow

1. Gather Information

## Error Report

### Error Message

TypeError: Cannot read property 'id' of undefined at UserService.getUser (src/services/user.ts:45:23) at async Router.handle (src/api/routes.ts:67:12)


### Context
- **Environment:** Production
- **Time:** 2024-01-15 14:30 UTC
- **Frequency:** 15 occurrences in last hour
- **Affected Users:** ~5% of requests
- **Recent Changes:** Deploy at 14:00 UTC

2. Categorize Error

## Error Classification

**Type:** Runtime Error
**Category:** Null/Undefined Reference
**Severity:** High (user-facing)

### Common Causes
1. Missing data validation
2. Race condition
3. API contract change
4. Data migration issue

3. Root Cause Analysis

## Root Cause Investigation

### Timeline
- 14:00 - Deploy to production
- 14:25 - First error reported
- 14:30 - Error rate increased

### Hypothesis 1: Deploy introduced bug
- Check: git diff between versions
- Result: New code path added

### Hypothesis 2: Data issue
- Check: Query for affected users
- Result: All have specific condition

### Root Cause
Deploy introduced code that assumes `user.profile` exists,
but 5% of users don't have profiles (legacy accounts).

4. Implement Fix

## Fix Implementation

### Short-term (Hotfix)
```typescript
// Add null check
const userId = user?.profile?.id ?? user.id;

Long-term

  1. Add data validation at API boundary
  2. Migrate legacy accounts
  3. Add regression test

## Error Categories

### By Origin

| Category | Description | Example |
|----------|-------------|---------|
| Client | User/browser errors | Invalid input |
| Server | Application errors | Null reference |
| Infrastructure | System errors | Connection timeout |
| External | Third-party errors | API rate limit |

### By Severity

| Level | Impact | Response |
|-------|--------|----------|
| Critical | System down | Immediate |
| High | Major feature broken | Hours |
| Medium | Degraded experience | This sprint |
| Low | Minor inconvenience | Backlog |

## Stack Trace Analysis

### Reading Stack Traces

```markdown
## Stack Trace Components

Error: Connection refused ← Error type and message at Database.connect ← Where error was thrown (src/db/connection.ts:23) ← File and line at async initialize ← Call chain (src/server.ts:45) at async main ← Entry point (src/index.ts:12)


### Key Questions
1. Where was the error thrown? (top of stack)
2. What called that code? (stack trace)
3. What was the input/state? (logs)
4. What changed recently? (git history)

Common Patterns

## Null Reference

TypeError: Cannot read property 'x' of undefined

**Check:** Variable existence, API response shape

## Connection Error

Error: ECONNREFUSED 127.0.0.1:5432

**Check:** Service running, network config, credentials

## Timeout

Error: Operation timed out after 30000ms

**Check:** Service health, query performance, network

## Memory

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed

**Check:** Memory leaks, large data processing

Log Analysis

Correlation

## Correlating Logs

### Using Request ID
```bash
grep "req-12345" application.log

Timeline Reconstruction

14:30:00.123 [req-12345] User login started
14:30:00.456 [req-12345] Fetching user profile
14:30:00.789 [req-12345] ERROR: Profile not found
14:30:00.790 [req-12345] TypeError: Cannot read 'id'

Pattern Identification

# Count error types
grep "ERROR" app.log | cut -d: -f2 | sort | uniq -c | sort -rn

# Find error spikes
grep "ERROR" app.log | cut -d' ' -f1 | uniq -c

## Resolution Tracking

### Fix Verification

```markdown
## Fix Verification Checklist

- [ ] Error no longer reproducible locally
- [ ] Unit test added for fix
- [ ] Integration test added
- [ ] Deployed to staging
- [ ] Verified in staging
- [ ] Deployed to production
- [ ] Monitoring error rate
- [ ] Error rate returned to baseline

Post-mortem Template

# Incident Post-mortem: [Title]

## Summary
Brief description of what happened.

## Timeline
- HH:MM - Event
- HH:MM - Detection
- HH:MM - Resolution

## Impact
- Users affected: X
- Duration: Y hours
- Revenue impact: $Z

## Root Cause
Detailed explanation.

## Resolution
What was done to fix it.

## Lessons Learned
1. What went well
2. What went poorly
3. Where we got lucky

## Action Items
- [ ] Prevent: [Action]
- [ ] Detect: [Action]
- [ ] Respond: [Action]
Skills Info
Original Name:error-analysisAuthor:jayteealao