name: alerts description: | Lightweight health signal with scoring, benchmarks, trends, and interactive alert-by-alert workflow. Default mode (--limited) checks 18 categories. Use --full for comprehensive reporting with all 42 categories.

Alerts -- Lightweight Health Signal

Ecosystem role: Consumer of health-ecosystem-audit ecosystem (D#17). /health-ecosystem-audit OWNS the ecosystem. /ecosystem-health and /alerts CONSUME it. /audit-health audits the audit system, not health.

Critical Rules

MUST run the checker script before presenting any data
MUST present every alert to the user (no silent filtering)
MUST recommend an action per alert based on severity + trend
MUST log every decision to session JSONL immediately (per-alert, not batched)
MUST NOT present a health score without running checkers first
MUST surface previous learnings from alerts-history.jsonl at warm-up

When to Use

User explicitly invokes /alerts
User asks "check alerts", "what needs attention", "system health", "show warnings", "pending issues"
Called from session-begin with --limited mode

When NOT to Use

Need	Use Instead	Why
Domain-specific audit	`/audit-code`, etc.	Deeper analysis per domain
Audit system meta-check	`/audit-health`	Checks audit infra, not health
Full health dashboard	`/ecosystem-health`	8 weighted categories, deep triage
Skill validation	`/skill-audit`	Quality audit for individual skill

Routing Decision Tree

"Just show me warnings" (30s) -> /alerts --limited
"Full health picture" (15min) -> /ecosystem-health
"Health system seems broken" (30min) -> /health-ecosystem-audit
"Is my audit system healthy?" (5min) -> /audit-health

What This Skill Does NOT Do

Infrastructure monitoring, uptime checks, or server health
CI/CD status monitoring (use /gh-fix-ci for that)
Live application metrics or APM

Overview

Intelligence = trend detection, cross-category grouping, suppression management, and self-audit of checker coverage. This skill measures project/codebase health, not infrastructure or server health.

Note: The checker script handles Phase 1 (data collection). Phases 2-7 are orchestrated by the skill caller (Claude). Session JSONL is created and populated by the caller.

Alert Object Schema

{
  "severity": "error | warning | info",
  "message": "string",
  "details": "string | array of strings | null",
  "action": "string | null"
}

Note: category is the key in results.categories{}, not a field in the alert object. trend is in category.context, not per-alert. Context objects vary per category (benchmarks, ratings, totals, sparklines, groups).

Usage

/alerts           # Limited mode (default) - 18 categories
/alerts --full    # Full mode - 42 categories
/alerts --limited # Explicit limited (used by session-begin)

Context-aware mode: When called from session-begin, MUST use --limited. When standalone, ask user which mode if not specified.

Workflow

Detailed phase workflows, schemas, benchmarks, and weights are in REFERENCE.md.

Warm-Up (MUST)

Mode: limited (18 categories, ~15-30s) or full (42 categories, ~30-60s)
What to expect: Dashboard overview, then interactive alert-by-alert walkthrough
Starting signal: "Starting health checks... (this may take 15-30s)"
Previous learnings (MUST): Read last 3 entries from alerts-history.jsonl. Surface insights if they exist.
Resume check (SHOULD): If progress file exists and is <2h old, offer resume.

Phase 1: Run & Parse (MUST)

Run: node .claude/skills/alerts/scripts/run-alerts.js --limited (or --full). Parse v2 JSON. Create session log. Load suppressions. Check for duplicate run (<30min). See REFERENCE.md for error handling details.

Phase 2: Dashboard Overview (MUST)

Scoring: 100 - (30/error + 10/warning). Grades: A=90+, B=80+, C=70+, D=60+, F=<60. Dashboard table sorted worst-first. Cross-session trends from alerts-history.jsonl. If 3+ categories declining simultaneously, flag as "ecosystem stress" and recommend /ecosystem-health. If trend=null, treat as stable. Zero alerts: skip Phases 3-5, go to Phase 6. Benchmark mapping: Poor -> error, Average -> warning, Good -> no alert. Volume spike guard: If current run has >2x the alert count of the previous run (from alerts-history.jsonl), flag: "Alert volume spike: N alerts (previous: M). This may indicate new warning sources. Review by category before walkthrough?" Trend data integrity (CL): Before computing trends, verify alerts-history.jsonl has entries, most recent is <7 days old, and count is reasonable. If any fail, mark trends as "unverified."

Phase 3: Alert-by-Alert Loop (MUST)

Sort errors first. Group 3+ same category+severity. Progress: "Alert N of M". See REFERENCE.md for context card format, decision semantics, example dialogue. Escape hatch every 10 alerts (10, 20, 30...). Remaining auto-defer with confirmation. Scope guard: >5min or >3 files -> pause. 2-5min -> defer to appropriate skill. <2min -> inline (follow CLAUDE.md Section 6). Decisions logged per-alert immediately.

Phase 4: Decision Review (MUST, skip if zero alerts)

Review all decisions. Contradiction check: same alert with conflicting decisions. Revision: show numbered list, user selects alerts to change. DONE-WHEN: User confirms "Proceed" or completes revisions.

Phase 5: Batch Execution + Fix Verification (MUST)

Execute fix_now items inline or via skills. Defer items: log and suggest /add-debt. On failure: report error, offer retry or skip. Update session JSONL with results. Convergence loop (SHOULD): Re-run affected checkers after fixes to verify resolution. If alert persists, flag as "fix failed" and present to user. DONE-WHEN: All selected fixes attempted and verified.

Phase 6: Self-Audit + Feedback (MUST)

Checker coverage, suppression health (>50% -> WARNING), score integrity, decision balance, trend health, process gaps, baseline staleness (>7d). Checker escalation: no_data for 3+ consecutive runs -> WARNING. User feedback (SHOULD): "Any observations about checker quality?" Save to alerts-history.jsonl. DONE-WHEN: All findings presented and decided.

Phase 7: Cleanup & Verification (MUST)

Write artifacts, save suppressions (reason MUST be present), clear resolved warnings, append to alerts-history.jsonl with learnings. Sync statusline: Run node scripts/sync-warnings-ack.js to bump lastCleared if all warning types are now acknowledged (keeps statusline unacked count accurate). Offer re-run (loops to Phase 1, same mode, delta display). Closure: "Alert review complete. Health: {before} -> {after}. Session log: [path]." DONE-WHEN: Files written and closure summary presented.

Phase separators: Use --- between all phases in output.

Dependencies

Hard: run-alerts.js, .claude/state/alert-suppressions.json Soft: alerts-baseline.json (computed first run), alerts-history.jsonl (no trends if missing), health-ecosystem-audit-history.jsonl (omit Test Health if missing/stale >1 day)

Compaction Resilience

State: .claude/tmp/alerts-progress-{YYYY-MM-DD}.json. On start: check if <2h old, offer resume. After every 5 decisions: update currentAlertIndex. On compaction: re-read state, skip completed alerts.

Delegation Protocol

Triggers: "you decide", "all", "auto", "apply all", "go ahead", "yes" at delegation prompt. For scoped requests ("fix errors only", "except item 3"), fall back to per-alert loop. If >15 alerts: "Apply to all N? [Apply All / Next 15 / Revise]". Show preview before execution: "Will apply: X Fix Now, Y Defer. Confirm? [Y/n]"

Suppression Reason Validation

Reason minimum: 15 characters. Must reference alert category or ticket/decision. Reject vague reasons ("skip", "not needed") — ask user to retry.

Integration

See REFERENCE.md for file schemas, benchmarks, weights, hook drill-down, scripts, learning loop details, and example walkthrough.

Neighbors: /audit-health, /ecosystem-health, /health-ecosystem-audit, session-begin

Skill routing: Code -> /audit-code, Security -> /audit-security, Reviews -> /pr-retro, Debt -> /add-debt, Hooks -> /hook-ecosystem-audit, Passive Surfacing -> /session-ecosystem-audit or /hook-ecosystem-audit

Scope boundary with session-begin: Session-begin gates on infrastructure failures (.claude/state/session-start-failures.json, .claude/state/pending-test-registry.json). /alerts reviews health trends from .claude/hook-warnings.json and other data sources. When called after session-begin in the same session, /alerts skips items already acknowledged by session-begin (uses lastCleared timestamp in .claude/state/hook-warnings-ack.json as shared boundary).

Convergence loops: Phase 5 fix verification and Phase 2 trend data integrity. Not used for initial checker execution (deterministic, single-pass).

Version History

Version	Date	Description
3.1	2026-03-24	Focused audit (11 decisions): PS boundary, volume spike, CL scope
3.0	2026-03-19	Skill audit v2: 56 decisions, REFERENCE.md extraction, CL fixes
2.0	2026-03-07	Full rewrite: 18 limited + 24 full checkers, 7-phase workflow
1.1	2026-02-25	Trim to <500 lines: condense benchmark tables and weights
1.0	2026-02-25	Initial implementation

alerts

SKILL.md