coordination-patterns
This skill should be used when the user asks about "agent coordination", "MAS architecture", "blackboard pattern", "orchestrator pattern", "how agents communicate", "multi-agent workflow", "event-driven agents", "context engineering", "control flow", "stateless reducers", or needs to design how multiple agents work together. Covers patterns from Planner-Executor-Verifier to event-driven architectures, plus 12 Factor Agents principles.
SKILL.md
| Name | coordination-patterns |
| Description | This skill should be used when the user asks about "agent coordination", "MAS architecture", "blackboard pattern", "orchestrator pattern", "how agents communicate", "multi-agent workflow", "event-driven agents", "context engineering", "control flow", "stateless reducers", or needs to design how multiple agents work together. Covers patterns from Planner-Executor-Verifier to event-driven architectures, plus 12 Factor Agents principles. |
name: Coordination Patterns description: This skill should be used when the user asks about "agent coordination", "MAS architecture", "blackboard pattern", "orchestrator pattern", "how agents communicate", "multi-agent workflow", "event-driven agents", "context engineering", "control flow", "stateless reducers", or needs to design how multiple agents work together. Covers patterns from Planner-Executor-Verifier to event-driven architectures, plus 12 Factor Agents principles. version: 1.1.0
Coordination Patterns
Pattern Selection Guide
| Pattern | When to Use | Complexity |
|---|---|---|
| Planner→Executor→Verifier | Default starting point | Low |
| Blackboard | Multiple agents, shared state | Medium |
| Orchestrator-Worker | Dynamic task assignment | Medium |
| Hierarchical | Deep delegation chains | High |
| Event-Driven | High reliability needs | High |
| Market-Based | Dynamic load balancing | High |
Pattern 1: Planner → Executor → Verifier (Minimum Viable MAS)
The baseline pattern that works most often.
User Request
│
▼
┌─────────┐
│ Planner │ ──► Task Graph
└─────────┘
│
▼
┌──────────┐
│ Executor │ ──► Results
└──────────┘
│
▼
┌──────────┐
│ Verifier │ ──► PASS/FAIL
└──────────┘
│
├─► PASS ──► Return Result
│
└─► FAIL ──► Re-plan with Feedback
Implementation
## Coordination Protocol
1. Planner receives requirements, outputs task graph
2. Executor processes tasks sequentially or parallel
3. Verifier checks all outputs against requirements
4. On FAIL: Planner receives feedback, re-plans
5. Max 3 iterations before escalation
When to Use
- Starting a new MAS project
- Tasks have clear decomposition
- Verification is important but not adversarial
Limitations
- Sequential bottleneck if tasks independent
- Single verifier may miss issues
- No internal checkpoints
Pattern 2: Blackboard Architecture
All agents read from and write to shared state.
┌─────────────────────────────────────┐
│ BLACKBOARD │
│ ┌─────────┐ ┌─────────┐ ┌───────┐ │
│ │ Plan │ │ Results │ │ State │ │
│ └─────────┘ └─────────┘ └───────┘ │
└─────────────────────────────────────┘
▲ ▲ ▲
│ │ │
┌───┴───┐ ┌────┴────┐ ┌───┴────┐
│Planner│ │Executors│ │Verifier│
└───────┘ └─────────┘ └────────┘
Implementation
{
"blackboard": {
"sections": {
"plan": {
"owner": "Planner",
"writers": ["Planner"],
"readers": ["Executor", "Verifier"]
},
"results": {
"owner": "Executor",
"writers": ["Executor"],
"readers": ["Verifier", "Orchestrator"]
},
"verdicts": {
"owner": "Verifier",
"writers": ["Verifier"],
"readers": ["Planner", "Orchestrator"]
}
}
}
}
Key Rules
- No direct overwrites: Agents cannot modify others' sections
- Versioned updates: Every write increments version
- Read permissions: Explicit per section
- Conflict-free: Writers have exclusive sections
When to Use
- Multiple agents need shared context
- Want to avoid direct agent-to-agent communication
- Need audit trail of all state changes
Pattern 3: Orchestrator-Worker
Central orchestrator assigns tasks to worker agents.
┌──────────────┐
│ Orchestrator │
└──────────────┘
/ | \
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│Worker 1│ │Worker 2│ │Worker 3│
└────────┘ └────────┘ └────────┘
Implementation
## Orchestrator Responsibilities
- Receive user request
- Decompose into tasks
- Assign to available workers
- Collect results
- Handle failures and retries
- Return final result
## Worker Protocol
- Poll for tasks or receive push
- Execute assigned task
- Report result to orchestrator
- No direct worker-to-worker communication
When to Use
- Dynamic workload distribution
- Workers are interchangeable
- Need central control point
Pattern 4: Hierarchical Agent
Layered delegation with parent-child relationships.
┌───────────┐
│ Manager │
└───────────┘
/ \
▼ ▼
┌──────────┐ ┌──────────┐
│ TeamLead │ │ TeamLead │
└──────────┘ └──────────┘
/ \ |
▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────┐
│Worker│ │Worker│ │Worker│
└──────┘ └──────┘ └──────┘
Implementation
## Delegation Rules
- Parents decompose tasks for children
- Children report completion to parent only
- No cross-branch communication
- Escalation goes up the hierarchy
## Span of Control
- Optimal: 3-5 direct reports per agent
- Max: 7 (coordination overhead scales)
When to Use
- Complex domains with natural hierarchy
- Different abstraction levels needed
- Clear chains of responsibility
Pattern 5: Event-Driven Architecture
Agents react to events rather than direct calls.
┌─────────────────────────────────────┐
│ EVENT BUS │
└─────────────────────────────────────┘
▲ ▲ ▲ ▲
│ publish │ publish │ publish │
│ │ │ │
┌────┴──┐ ┌────┴──┐ ┌────┴──┐ ┌────┴──┐
│Agent A│ │Agent B│ │Agent C│ │Agent D│
└───────┘ └───────┘ └───────┘ └───────┘
│ │ │ │
▼ sub ▼ sub ▼ sub ▼ sub
┌─────────────────────────────────────┐
│ EVENT BUS │
└─────────────────────────────────────┘
Event Types
{
"event_types": {
"TaskCreated": { "triggers": ["Executor"] },
"TaskCompleted": { "triggers": ["Verifier", "Logger"] },
"VerificationFailed": { "triggers": ["Planner"] },
"SystemError": { "triggers": ["AlertHandler"] }
}
}
Benefits
- Reliable coordination: Replayable events enable recovery
- Loose coupling: Agents don't know about each other
- Scalable: Easy to add new agents
- Auditable: Complete event history
When to Use
- High reliability requirements
- Need fault tolerance
- Complex event dependencies
- Async processing beneficial
Pattern 6: Escalation Over Consensus
When agents disagree, escalate—don't vote.
┌─────────┐ ┌─────────┐
│ Agent A │ │ Agent B │
└─────────┘ └─────────┘
│ │
▼ ▼
Output A Output B
│ │
└─────┬─────┘
│ (conflict?)
▼
┌───────────┐
│ Escalator │ ──► Final Decision
└───────────┘
Why Not Vote?
- Averaging dilutes correctness
- Majority can be wrong
- Loses minority insight
Escalation Protocol
1. Detect conflict (outputs differ significantly)
2. Preserve both outputs with reasoning
3. Escalate to higher-authority agent
4. Authority decides based on evidence, not popularity
5. Log decision rationale for learning
Anti-Patterns to Avoid
Anti-Pattern 1: Synchronous Blocking Chains
Bad: Agent A calls Agent B calls Agent C, each waiting.
Impact: Latency accumulates, one failure blocks all.
Fix: Use async message passing or events.
Anti-Pattern 2: Implicit State Sharing
Bad: Agents assume shared context without explicit state.
Impact: Race conditions, state corruption.
Fix: Use blackboard with explicit read/write permissions.
Anti-Pattern 3: Perfect Harmony
Bad: System designed for agents to always agree.
Impact: Groupthink, missed errors.
Fix: Add controlled friction (critics, independent verification).
12 Factor Agents: Control Flow and State
The 12 Factor Agents framework provides engineering principles for coordination.
Factor 3: Own Your Context Building
Principle: Everything that makes agents good is context engineering. Understand what happens at the token level.
Context building components:
- System prompt - Agent identity and instructions
- RAG results - Retrieved relevant information
- Memory - Episodic and semantic recall
- Agentic history - Previous steps in this workflow
- Structured output instructions - Format requirements
Explicit Context Building Pattern:
def build_context(agent_id: str, task: dict) -> list:
"""Explicit context assembly - no magic."""
context = []
# 1. System prompt (Factor 2 - own your prompts)
context.append({"role": "system", "content": AGENT_PROMPTS[agent_id]})
# 2. RAG - retrieve relevant documents
relevant_docs = retrieve(task["query"], top_k=3)
context.append({"role": "system", "content": format_docs(relevant_docs)})
# 3. Memory - recall from past
memories = recall(agent_id, task["context"])
context.append({"role": "system", "content": format_memories(memories)})
# 4. Agentic history - what happened so far
history = get_workflow_history(task["workflow_id"])
context.append({"role": "system", "content": format_history(history)})
# 5. Current task
context.append({"role": "user", "content": task["input"]})
return context
Context Budget:
| Component | Token Budget | Purpose |
|---|---|---|
| System prompt | 500-1000 | Agent identity |
| RAG results | 1000-2000 | Relevant knowledge |
| Memory | 500-1000 | Past experiences |
| History | 500-1000 | Workflow context |
| Task input | Variable | Current request |
Key insight: If you don't understand what happens at the token level, you miss optimization opportunities.
Factor 5/6: Unified Execution and Business State
Principle: Enable Launch/Pause/Resume with simple APIs. Unify what's happening (execution) with what's happened (business).
Unified State Schema:
{
"workflow_id": "uuid",
"status": "running|paused|completed|failed",
"execution_state": {
"current_step": "step_name",
"next_step": "step_name|null",
"waiting_for": "human_input|external_api|null",
"retry_config": {
"attempts": 2,
"max_attempts": 3,
"backoff_ms": 1000
}
},
"business_state": {
"messages": [],
"tool_calls": [],
"tool_results": [],
"decisions_made": [],
"artifacts_produced": []
},
"timestamps": {
"created": "ISO8601",
"last_updated": "ISO8601",
"paused_at": "ISO8601|null",
"resumed_at": "ISO8601|null"
}
}
Launch/Pause/Resume API:
class WorkflowController:
def launch(self, workflow_id: str, initial_input: dict) -> str:
"""Start workflow, return workflow_id."""
state = create_initial_state(workflow_id, initial_input)
self.state_store.save(workflow_id, state)
self.execute_next_step(workflow_id)
return workflow_id
def pause(self, workflow_id: str, reason: str) -> bool:
"""Pause workflow, preserving all state."""
state = self.state_store.load(workflow_id)
state["status"] = "paused"
state["timestamps"]["paused_at"] = now()
state["execution_state"]["pause_reason"] = reason
self.state_store.save(workflow_id, state)
return True
def resume(self, workflow_id: str) -> bool:
"""Resume from exactly where we left off."""
state = self.state_store.load(workflow_id)
state["status"] = "running"
state["timestamps"]["resumed_at"] = now()
self.state_store.save(workflow_id, state)
self.execute_next_step(workflow_id)
return True
See references/state-management.md for detailed implementation.
Factor 8: Own Your Control Flow
Principle: Don't let the LLM control the entire DAG. If you own control flow, you can Break, Switch, Summarize, Judge.
Control Flow Operations:
| Operation | Purpose | When to Use |
|---|---|---|
| Break | Stop agent loop early | Error threshold, timeout, explicit stop signal |
| Switch | Route to different agent | Based on output classification |
| Summarize | Compress context | Approaching token limit |
| Judge | Evaluate quality | Before committing results |
Anti-pattern: LLM-Controlled DAG
# BAD: LLM decides what to do next autonomously
response = llm.call("You have full control. What should we do next?")
next_action = parse_action(response) # LLM controls flow
Pattern: Code-Controlled DAG
# GOOD: Code owns the control flow
def workflow_step(state: dict) -> dict:
# 1. Execute current step
result = execute_step(state["current_step"], state)
# 2. Code decides next step (not LLM)
if result["needs_human"]:
return transition(state, "human_input") # BREAK for human
elif result["context_tokens"] > 6000:
return transition(state, "summarize") # SUMMARIZE
elif result["quality_uncertain"]:
return transition(state, "verify") # JUDGE
elif result["category"] == "complex":
return transition(state, "specialist") # SWITCH
else:
return transition(state, result["next"]) # CONTINUE
Smaller Focused Prompts Beat Long Autonomous Runs:
Instead of:
"Do everything: plan, execute, verify, report"
Use:
Step 1: "Create a plan" (focused prompt)
[Code evaluates plan quality]
Step 2: "Execute task X" (focused prompt)
[Code checks result]
Step 3: "Verify result" (focused prompt)
[Code decides next step]
Factor 12: Stateless Reducers
Principle: Agent logic as pure functions that reduce (state, event) → new_state. Enables replay, debugging, and reasoning about behavior.
Reducer Pattern:
def agent_reducer(state: dict, event: dict) -> dict:
"""
Pure function: no side effects, deterministic output.
Args:
state: Current workflow state
event: What just happened (user input, tool result, etc.)
Returns:
New state (never mutates input)
"""
new_state = deepcopy(state)
match event["type"]:
case "USER_INPUT":
new_state["business_state"]["messages"].append(event["data"])
new_state["execution_state"]["next_step"] = "plan"
case "PLAN_CREATED":
new_state["business_state"]["plan"] = event["data"]
new_state["execution_state"]["next_step"] = "execute"
case "TASK_COMPLETED":
new_state["business_state"]["results"].append(event["data"])
remaining = get_remaining_tasks(new_state)
new_state["execution_state"]["next_step"] = "execute" if remaining else "verify"
case "VERIFICATION_FAILED":
new_state["execution_state"]["retry_config"]["attempts"] += 1
new_state["execution_state"]["next_step"] = "replan"
new_state["timestamps"]["last_updated"] = now()
return new_state
Benefits of Reducer Pattern:
- Replay: Feed same events, get same state
- Debugging: Inspect state at any point
- Testing: Pure functions are easy to test
- Time travel: Rollback by replaying subset of events
Event Log for Replay:
{
"workflow_id": "uuid",
"events": [
{"seq": 1, "type": "USER_INPUT", "data": {...}, "timestamp": "..."},
{"seq": 2, "type": "PLAN_CREATED", "data": {...}, "timestamp": "..."},
{"seq": 3, "type": "TASK_COMPLETED", "data": {...}, "timestamp": "..."}
]
}
To replay: final_state = reduce(agent_reducer, events, initial_state)
Additional Resources
Reference Files
For detailed implementation patterns:
references/event-driven-details.md- Complete event-driven implementationreferences/state-management.md- State synchronization strategies (includes unified state, reducers)../agent-specification/references/twelve-factor-agents.md- Quick reference for all 12 factors
Related Skills
- agent-specification - Define each agent properly (Factors 1, 2, 4, 7)
- production-readiness - Add observability and error handling (Factors 9, 11)
- mas-decision-gate - Decide if multi-agent is needed (Factor 10)