Agent Skill
2/7/2026

alerts

System health and alerts reporting skill. This skill should be used at session start to surface pending alerts, or anytime the user wants to check system health. Triggers: "alerts", "check alerts", "what needs attention", "system health", "show warnings", "pending issues". Default mode (--limited) checks Code Health, Security, and Session Context. Use --full for comprehensive reporting including Documentation Health and Roadmap/Planning.

J
jasonmichaelbell78
2GitHub Stars
3Views
npx skills add jasonmichaelbell78-creator/sonash-v0

SKILL.md

Namealerts
DescriptionSystem health and alerts reporting skill. This skill should be used at session start to surface pending alerts, or anytime the user wants to check system health. Triggers: "alerts", "check alerts", "what needs attention", "system health", "show warnings", "pending issues". Default mode (--limited) checks Code Health, Security, and Session Context. Use --full for comprehensive reporting including Documentation Health and Roadmap/Planning.

name: alerts description: | Lightweight health signal with scoring, benchmarks, trends, and interactive alert-by-alert workflow. Default mode (--limited) checks 18 categories. Use --full for comprehensive reporting with all 42 categories.

Alerts -- Lightweight Health Signal

Ecosystem role: Consumer of health-ecosystem-audit ecosystem (D#17). /health-ecosystem-audit OWNS the ecosystem. /ecosystem-health and /alerts CONSUME it. /audit-health audits the audit system, not health.

Critical Rules

  1. MUST run the checker script before presenting any data
  2. MUST present every alert to the user (no silent filtering)
  3. MUST recommend an action per alert based on severity + trend
  4. MUST log every decision to session JSONL immediately (per-alert, not batched)
  5. MUST NOT present a health score without running checkers first
  6. MUST surface previous learnings from alerts-history.jsonl at warm-up

When to Use

  • User explicitly invokes /alerts
  • User asks "check alerts", "what needs attention", "system health", "show warnings", "pending issues"
  • Called from session-begin with --limited mode

When NOT to Use

NeedUse InsteadWhy
Domain-specific audit/audit-code, etc.Deeper analysis per domain
Audit system meta-check/audit-healthChecks audit infra, not health
Full health dashboard/ecosystem-health8 weighted categories, deep triage
Skill validation/skill-auditQuality audit for individual skill

Routing Decision Tree

  • "Just show me warnings" (30s) -> /alerts --limited
  • "Full health picture" (15min) -> /ecosystem-health
  • "Health system seems broken" (30min) -> /health-ecosystem-audit
  • "Is my audit system healthy?" (5min) -> /audit-health

What This Skill Does NOT Do

  • Infrastructure monitoring, uptime checks, or server health
  • CI/CD status monitoring (use /gh-fix-ci for that)
  • Live application metrics or APM

Overview

Intelligence = trend detection, cross-category grouping, suppression management, and self-audit of checker coverage. This skill measures project/codebase health, not infrastructure or server health.

Note: The checker script handles Phase 1 (data collection). Phases 2-7 are orchestrated by the skill caller (Claude). Session JSONL is created and populated by the caller.

Alert Object Schema

{
  "severity": "error | warning | info",
  "message": "string",
  "details": "string | array of strings | null",
  "action": "string | null"
}

Note: category is the key in results.categories{}, not a field in the alert object. trend is in category.context, not per-alert. Context objects vary per category (benchmarks, ratings, totals, sparklines, groups).

Usage

/alerts           # Limited mode (default) - 18 categories
/alerts --full    # Full mode - 42 categories
/alerts --limited # Explicit limited (used by session-begin)

Context-aware mode: When called from session-begin, MUST use --limited. When standalone, ask user which mode if not specified.


Workflow

Detailed phase workflows, schemas, benchmarks, and weights are in REFERENCE.md.

Warm-Up (MUST)

  • Mode: limited (18 categories, ~15-30s) or full (42 categories, ~30-60s)
  • What to expect: Dashboard overview, then interactive alert-by-alert walkthrough
  • Starting signal: "Starting health checks... (this may take 15-30s)"
  • Previous learnings (MUST): Read last 3 entries from alerts-history.jsonl. Surface insights if they exist.
  • Resume check (SHOULD): If progress file exists and is <2h old, offer resume.

Phase 1: Run & Parse (MUST)

Run: node .claude/skills/alerts/scripts/run-alerts.js --limited (or --full). Parse v2 JSON. Create session log. Load suppressions. Check for duplicate run (<30min). See REFERENCE.md for error handling details.

Phase 2: Dashboard Overview (MUST)

Scoring: 100 - (30/error + 10/warning). Grades: A=90+, B=80+, C=70+, D=60+, F=<60. Dashboard table sorted worst-first. Cross-session trends from alerts-history.jsonl. If 3+ categories declining simultaneously, flag as "ecosystem stress" and recommend /ecosystem-health. If trend=null, treat as stable. Zero alerts: skip Phases 3-5, go to Phase 6. Benchmark mapping: Poor -> error, Average -> warning, Good -> no alert. Volume spike guard: If current run has >2x the alert count of the previous run (from alerts-history.jsonl), flag: "Alert volume spike: N alerts (previous: M). This may indicate new warning sources. Review by category before walkthrough?" Trend data integrity (CL): Before computing trends, verify alerts-history.jsonl has entries, most recent is <7 days old, and count is reasonable. If any fail, mark trends as "unverified."

Phase 3: Alert-by-Alert Loop (MUST)

Sort errors first. Group 3+ same category+severity. Progress: "Alert N of M". See REFERENCE.md for context card format, decision semantics, example dialogue. Escape hatch every 10 alerts (10, 20, 30...). Remaining auto-defer with confirmation. Scope guard: >5min or >3 files -> pause. 2-5min -> defer to appropriate skill. <2min -> inline (follow CLAUDE.md Section 6). Decisions logged per-alert immediately.

Phase 4: Decision Review (MUST, skip if zero alerts)

Review all decisions. Contradiction check: same alert with conflicting decisions. Revision: show numbered list, user selects alerts to change. DONE-WHEN: User confirms "Proceed" or completes revisions.

Phase 5: Batch Execution + Fix Verification (MUST)

Execute fix_now items inline or via skills. Defer items: log and suggest /add-debt. On failure: report error, offer retry or skip. Update session JSONL with results. Convergence loop (SHOULD): Re-run affected checkers after fixes to verify resolution. If alert persists, flag as "fix failed" and present to user. DONE-WHEN: All selected fixes attempted and verified.

Phase 6: Self-Audit + Feedback (MUST)

Checker coverage, suppression health (>50% -> WARNING), score integrity, decision balance, trend health, process gaps, baseline staleness (>7d). Checker escalation: no_data for 3+ consecutive runs -> WARNING. User feedback (SHOULD): "Any observations about checker quality?" Save to alerts-history.jsonl. DONE-WHEN: All findings presented and decided.

Phase 7: Cleanup & Verification (MUST)

Write artifacts, save suppressions (reason MUST be present), clear resolved warnings, append to alerts-history.jsonl with learnings. Sync statusline: Run node scripts/sync-warnings-ack.js to bump lastCleared if all warning types are now acknowledged (keeps statusline unacked count accurate). Offer re-run (loops to Phase 1, same mode, delta display). Closure: "Alert review complete. Health: {before} -> {after}. Session log: [path]." DONE-WHEN: Files written and closure summary presented.


Phase separators: Use --- between all phases in output.

Dependencies

Hard: run-alerts.js, .claude/state/alert-suppressions.json Soft: alerts-baseline.json (computed first run), alerts-history.jsonl (no trends if missing), health-ecosystem-audit-history.jsonl (omit Test Health if missing/stale >1 day)

Compaction Resilience

State: .claude/tmp/alerts-progress-{YYYY-MM-DD}.json. On start: check if <2h old, offer resume. After every 5 decisions: update currentAlertIndex. On compaction: re-read state, skip completed alerts.

Delegation Protocol

Triggers: "you decide", "all", "auto", "apply all", "go ahead", "yes" at delegation prompt. For scoped requests ("fix errors only", "except item 3"), fall back to per-alert loop. If >15 alerts: "Apply to all N? [Apply All / Next 15 / Revise]". Show preview before execution: "Will apply: X Fix Now, Y Defer. Confirm? [Y/n]"

Suppression Reason Validation

Reason minimum: 15 characters. Must reference alert category or ticket/decision. Reject vague reasons ("skip", "not needed") — ask user to retry.

Integration

See REFERENCE.md for file schemas, benchmarks, weights, hook drill-down, scripts, learning loop details, and example walkthrough.

Neighbors: /audit-health, /ecosystem-health, /health-ecosystem-audit, session-begin

Skill routing: Code -> /audit-code, Security -> /audit-security, Reviews -> /pr-retro, Debt -> /add-debt, Hooks -> /hook-ecosystem-audit, Passive Surfacing -> /session-ecosystem-audit or /hook-ecosystem-audit

Scope boundary with session-begin: Session-begin gates on infrastructure failures (.claude/state/session-start-failures.json, .claude/state/pending-test-registry.json). /alerts reviews health trends from .claude/hook-warnings.json and other data sources. When called after session-begin in the same session, /alerts skips items already acknowledged by session-begin (uses lastCleared timestamp in .claude/state/hook-warnings-ack.json as shared boundary).

Convergence loops: Phase 5 fix verification and Phase 2 trend data integrity. Not used for initial checker execution (deterministic, single-pass).


Version History

VersionDateDescription
3.12026-03-24Focused audit (11 decisions): PS boundary, volume spike, CL scope
3.02026-03-19Skill audit v2: 56 decisions, REFERENCE.md extraction, CL fixes
2.02026-03-07Full rewrite: 18 limited + 24 full checkers, 7-phase workflow
1.12026-02-25Trim to <500 lines: condense benchmark tables and weights
1.02026-02-25Initial implementation
Skills Info
Original Name:alertsAuthor:jasonmichaelbell78