alerts
System health and alerts reporting skill. This skill should be used at session start to surface pending alerts, or anytime the user wants to check system health. Triggers: "alerts", "check alerts", "what needs attention", "system health", "show warnings", "pending issues". Default mode (--limited) checks Code Health, Security, and Session Context. Use --full for comprehensive reporting including Documentation Health and Roadmap/Planning.
SKILL.md
| Name | alerts |
| Description | System health and alerts reporting skill. This skill should be used at session start to surface pending alerts, or anytime the user wants to check system health. Triggers: "alerts", "check alerts", "what needs attention", "system health", "show warnings", "pending issues". Default mode (--limited) checks Code Health, Security, and Session Context. Use --full for comprehensive reporting including Documentation Health and Roadmap/Planning. |
name: alerts description: | Lightweight health signal with scoring, benchmarks, trends, and interactive alert-by-alert workflow. Default mode (--limited) checks 18 categories. Use --full for comprehensive reporting with all 42 categories.
Alerts -- Lightweight Health Signal
Ecosystem role: Consumer of health-ecosystem-audit ecosystem (D#17).
/health-ecosystem-audit OWNS the ecosystem. /ecosystem-health and /alerts
CONSUME it. /audit-health audits the audit system, not health.
Critical Rules
- MUST run the checker script before presenting any data
- MUST present every alert to the user (no silent filtering)
- MUST recommend an action per alert based on severity + trend
- MUST log every decision to session JSONL immediately (per-alert, not batched)
- MUST NOT present a health score without running checkers first
- MUST surface previous learnings from alerts-history.jsonl at warm-up
When to Use
- User explicitly invokes
/alerts - User asks "check alerts", "what needs attention", "system health", "show warnings", "pending issues"
- Called from
session-beginwith--limitedmode
When NOT to Use
| Need | Use Instead | Why |
|---|---|---|
| Domain-specific audit | /audit-code, etc. | Deeper analysis per domain |
| Audit system meta-check | /audit-health | Checks audit infra, not health |
| Full health dashboard | /ecosystem-health | 8 weighted categories, deep triage |
| Skill validation | /skill-audit | Quality audit for individual skill |
Routing Decision Tree
- "Just show me warnings" (30s) ->
/alerts --limited - "Full health picture" (15min) ->
/ecosystem-health - "Health system seems broken" (30min) ->
/health-ecosystem-audit - "Is my audit system healthy?" (5min) ->
/audit-health
What This Skill Does NOT Do
- Infrastructure monitoring, uptime checks, or server health
- CI/CD status monitoring (use
/gh-fix-cifor that) - Live application metrics or APM
Overview
Intelligence = trend detection, cross-category grouping, suppression management, and self-audit of checker coverage. This skill measures project/codebase health, not infrastructure or server health.
Note: The checker script handles Phase 1 (data collection). Phases 2-7 are orchestrated by the skill caller (Claude). Session JSONL is created and populated by the caller.
Alert Object Schema
{
"severity": "error | warning | info",
"message": "string",
"details": "string | array of strings | null",
"action": "string | null"
}
Note: category is the key in results.categories{}, not a field in the alert
object. trend is in category.context, not per-alert. Context objects vary
per category (benchmarks, ratings, totals, sparklines, groups).
Usage
/alerts # Limited mode (default) - 18 categories
/alerts --full # Full mode - 42 categories
/alerts --limited # Explicit limited (used by session-begin)
Context-aware mode: When called from session-begin, MUST use --limited.
When standalone, ask user which mode if not specified.
Workflow
Detailed phase workflows, schemas, benchmarks, and weights are in REFERENCE.md.
Warm-Up (MUST)
- Mode: limited (18 categories, ~15-30s) or full (42 categories, ~30-60s)
- What to expect: Dashboard overview, then interactive alert-by-alert walkthrough
- Starting signal: "Starting health checks... (this may take 15-30s)"
- Previous learnings (MUST): Read last 3 entries from alerts-history.jsonl. Surface insights if they exist.
- Resume check (SHOULD): If progress file exists and is <2h old, offer resume.
Phase 1: Run & Parse (MUST)
Run: node .claude/skills/alerts/scripts/run-alerts.js --limited (or --full).
Parse v2 JSON. Create session log. Load suppressions. Check for duplicate run
(<30min). See REFERENCE.md for error handling details.
Phase 2: Dashboard Overview (MUST)
Scoring: 100 - (30/error + 10/warning). Grades: A=90+, B=80+, C=70+, D=60+,
F=<60. Dashboard table sorted worst-first. Cross-session trends from
alerts-history.jsonl. If 3+ categories declining simultaneously, flag as
"ecosystem stress" and recommend /ecosystem-health. If trend=null, treat as
stable. Zero alerts: skip Phases 3-5, go to Phase 6. Benchmark mapping: Poor ->
error, Average -> warning, Good -> no alert. Volume spike guard: If current
run has >2x the alert count of the previous run (from alerts-history.jsonl),
flag: "Alert volume spike: N alerts (previous: M). This may indicate new warning
sources. Review by category before walkthrough?" Trend data integrity (CL):
Before computing trends, verify alerts-history.jsonl has entries, most recent is
<7 days old, and count is reasonable. If any fail, mark trends as "unverified."
Phase 3: Alert-by-Alert Loop (MUST)
Sort errors first. Group 3+ same category+severity. Progress: "Alert N of M". See REFERENCE.md for context card format, decision semantics, example dialogue. Escape hatch every 10 alerts (10, 20, 30...). Remaining auto-defer with confirmation. Scope guard: >5min or >3 files -> pause. 2-5min -> defer to appropriate skill. <2min -> inline (follow CLAUDE.md Section 6). Decisions logged per-alert immediately.
Phase 4: Decision Review (MUST, skip if zero alerts)
Review all decisions. Contradiction check: same alert with conflicting decisions. Revision: show numbered list, user selects alerts to change. DONE-WHEN: User confirms "Proceed" or completes revisions.
Phase 5: Batch Execution + Fix Verification (MUST)
Execute fix_now items inline or via skills. Defer items: log and suggest
/add-debt. On failure: report error, offer retry or skip. Update session JSONL
with results. Convergence loop (SHOULD): Re-run affected checkers after
fixes to verify resolution. If alert persists, flag as "fix failed" and present
to user. DONE-WHEN: All selected fixes attempted and verified.
Phase 6: Self-Audit + Feedback (MUST)
Checker coverage, suppression health (>50% -> WARNING), score integrity, decision balance, trend health, process gaps, baseline staleness (>7d). Checker escalation: no_data for 3+ consecutive runs -> WARNING. User feedback (SHOULD): "Any observations about checker quality?" Save to alerts-history.jsonl. DONE-WHEN: All findings presented and decided.
Phase 7: Cleanup & Verification (MUST)
Write artifacts, save suppressions (reason MUST be present), clear resolved
warnings, append to alerts-history.jsonl with learnings. Sync statusline:
Run node scripts/sync-warnings-ack.js to bump lastCleared if all warning
types are now acknowledged (keeps statusline unacked count accurate). Offer
re-run (loops to Phase 1, same mode, delta display). Closure: "Alert review
complete. Health: {before} -> {after}. Session log: [path]." DONE-WHEN: Files
written and closure summary presented.
Phase separators: Use --- between all phases in output.
Dependencies
Hard: run-alerts.js, .claude/state/alert-suppressions.json Soft:
alerts-baseline.json (computed first run), alerts-history.jsonl (no trends
if missing), health-ecosystem-audit-history.jsonl (omit Test Health if
missing/stale >1 day)
Compaction Resilience
State: .claude/tmp/alerts-progress-{YYYY-MM-DD}.json. On start: check if <2h
old, offer resume. After every 5 decisions: update currentAlertIndex. On
compaction: re-read state, skip completed alerts.
Delegation Protocol
Triggers: "you decide", "all", "auto", "apply all", "go ahead", "yes" at delegation prompt. For scoped requests ("fix errors only", "except item 3"), fall back to per-alert loop. If >15 alerts: "Apply to all N? [Apply All / Next 15 / Revise]". Show preview before execution: "Will apply: X Fix Now, Y Defer. Confirm? [Y/n]"
Suppression Reason Validation
Reason minimum: 15 characters. Must reference alert category or ticket/decision. Reject vague reasons ("skip", "not needed") — ask user to retry.
Integration
See REFERENCE.md for file schemas, benchmarks, weights, hook drill-down, scripts, learning loop details, and example walkthrough.
Neighbors: /audit-health, /ecosystem-health, /health-ecosystem-audit,
session-begin
Skill routing: Code -> /audit-code, Security -> /audit-security, Reviews
-> /pr-retro, Debt -> /add-debt, Hooks -> /hook-ecosystem-audit, Passive
Surfacing -> /session-ecosystem-audit or /hook-ecosystem-audit
Scope boundary with session-begin: Session-begin gates on infrastructure
failures (.claude/state/session-start-failures.json,
.claude/state/pending-test-registry.json). /alerts reviews health trends
from .claude/hook-warnings.json and other data sources. When called after
session-begin in the same session, /alerts skips items already acknowledged by
session-begin (uses lastCleared timestamp in
.claude/state/hook-warnings-ack.json as shared boundary).
Convergence loops: Phase 5 fix verification and Phase 2 trend data integrity. Not used for initial checker execution (deterministic, single-pass).
Version History
| Version | Date | Description |
|---|---|---|
| 3.1 | 2026-03-24 | Focused audit (11 decisions): PS boundary, volume spike, CL scope |
| 3.0 | 2026-03-19 | Skill audit v2: 56 decisions, REFERENCE.md extraction, CL fixes |
| 2.0 | 2026-03-07 | Full rewrite: 18 limited + 24 full checkers, 7-phase workflow |
| 1.1 | 2026-02-25 | Trim to <500 lines: condense benchmark tables and weights |
| 1.0 | 2026-02-25 | Initial implementation |