Agent Skill
2/7/2026

ai-agent-design-skill

**Domain**: AI/ML Architecture

F
fabioc
2GitHub Stars
1Views
npx skills add fabioc-aloha/Alex_Plug_In

SKILL.md

Nameai-agent-design-skill
Description**Domain**: AI/ML Architecture

name: ai-agent-design description: Design autonomous AI agents that reason, plan, and execute tasks tier: standard applyTo: '/agent,/ai-agent,/orchestrat,/multi-agent'

AI Agent Design Skill

Patterns for designing AI agents—autonomous systems that use LLMs to reason, plan, and execute multi-step tasks.

Agent vs Chatbot vs Workflow

AspectChatbotWorkflowAgent
AutonomyLowNoneHigh
PlanningNonePredefinedDynamic
Tool UseLimitedFixedFlexible
MemorySessionNonePersistent
Error RecoveryRetryFailReason & adapt

Core Patterns

ReAct (Reasoning + Acting)

1. Thought: Reason about the task
2. Action: Choose and execute a tool
3. Observation: Process tool output
4. Repeat until complete

Example:

Thought: Need Seattle weather to answer umbrella question
Action: weather_api(location="Seattle")
Observation: {"temp": 52, "condition": "rain", "precipitation": 80%}
Thought: Raining with 80% precipitation. Recommend umbrella.

Plan-and-Execute

For complex multi-step tasks:

  1. Planner: Create high-level plan
  2. Executor: Execute each step
  3. Replanner: Adjust based on results

Use when order matters and partial failures need recovery.

Reflexion

Self-improvement through reflection:

  1. Attempt task
  2. Evaluate outcome
  3. Generate reflection on failures
  4. Store reflection in memory
  5. Retry with reflection context

Multi-Agent Patterns

Supervisor

Central coordinator delegates to specialists:

       Supervisor
      /    |    \
Research Writer Reviewer

Hierarchical Teams

Nested supervisors for complex organizations:

      Top Supervisor
       /         \
Research Lead  Writing Lead
   /    \         /    \
Web   Paper   Draft   Edit

Debate/Adversarial

Multiple agents argue to reduce hallucination:

Agent A (Pro) <--argue--> Agent B (Con)
              \    |    /
               Judge

Tool Design

{
  "name": "search_database",
  "description": "Search products. Use for availability/pricing queries.",
  "parameters": {
    "query": { "type": "string", "description": "Search terms" },
    "max_results": { "type": "integer", "default": 10 }
  }
}

Principles:

  • Clear names (verb + noun)
  • Rich descriptions with when/what
  • Sensible defaults
  • Structured error returns

Tool Selection by Scale

ToolsStrategy
< 10Direct selection
10-50Categorize first
50+Embed and retrieve

Memory Architecture

Working Memory    → Current context (in prompt)
Short-Term Memory → Session state (key-value)
Long-Term Memory  → Facts, history (vector DB + graph)

Memory Types

TypeStorageUse Case
EpisodicVector DBPast conversations
SemanticGraph DBFacts, relationships
ProceduralCode/promptsHow to do tasks
WorkingPromptCurrent task

Memory Management

  • Summarization: Compress old conversations
  • Forgetting: Score by recency × importance × access
  • Consolidation: Merge similar memories

Error Recovery Ladder

  1. Retry: Same action with backoff
  2. Rephrase: Different query, same goal
  3. Alternative: Different tool, same goal
  4. Partial: Return partial results
  5. Escalate: Ask human
  6. Abort: Cannot complete, explain why

Loop Detection

def detect_loop(history, window=5, threshold=0.8):
    recent = history[-window:]
    previous = history[-window*2:-window]
    return similarity(recent, previous) > threshold

Recovery: reflection prompt, force tool change, replan, escalate.

Human-in-the-Loop

Require approval for high-risk actions:

  • Financial transactions
  • Data deletion
  • External communications
  • Permission changes
  • Irreversible operations

Production Considerations

Observability

Log: LLM calls, tool calls, state transitions, errors, recovery attempts.

Cost Control

StrategyImplementation
Token budgetsMax tokens per task
Step limitsMax N actions
Tiered modelsGPT-4 plan, 3.5 execute
CachingCache tool/LLM results
Early terminationStop when good enough

Safety Guardrails

  • Input: Injection detection, PII filtering, rate limiting
  • Action: Parameter sanitization, permission checks
  • Output: Policy compliance, hallucination detection

Framework Comparison

FrameworkBest For
LangChainRapid prototyping
LangGraphComplex multi-agent
AutoGenResearch, code gen
CrewAIBusiness workflows
Semantic KernelMicrosoft stack

Anti-Patterns

  • Over-autonomous: No approval checkpoints
  • Unbounded loops: No termination conditions
  • Tool explosion: Too many tools confuse agent
  • Memory bloat: No pruning strategy
  • Monolithic: One agent does everything

Checklist

  • Clear agent persona and capabilities
  • Minimal, well-described tool set
  • Appropriate memory architecture
  • Human-in-the-loop for high-risk
  • Observability (logging, tracing)
  • Safety guardrails
  • Adversarial input testing
  • Cost control and scaling plan

When to Use

Good: Open-ended research, multi-step workflows, tool orchestration ❌ Poor: Simple Q&A (use RAG), deterministic flows (use code), no human oversight

Skills Info
Original Name:ai-agent-design-skillAuthor:fabioc