langfuse-instrumentation-setup
Use this skill when users want to add Langfuse tracing to their code, set up observability, instrument LLM calls, or add scoring to their pipelines. Helps users correctly structure traces, spans, and generations.
SKILL.md
| Name | langfuse-instrumentation-setup |
| Description | Use this skill when users want to add Langfuse tracing to their code, set up observability, instrument LLM calls, or add scoring to their pipelines. Helps users correctly structure traces, spans, and generations. |
name: Langfuse Instrumentation Setup description: Use this skill when users want to add Langfuse tracing to their code, set up observability, instrument LLM calls, or add scoring to their pipelines. Helps users correctly structure traces, spans, and generations. version: 1.0.0
Langfuse Instrumentation Setup
This skill helps you correctly instrument Python code with Langfuse tracing. It addresses common misunderstandings about the tracing model and provides best-practice patterns.
When to Use This Skill
Activate this skill when users ask to:
- "Set up Langfuse tracing"
- "Instrument my code with Langfuse"
- "Add observability to my pipeline"
- "Trace my LLM calls"
- "Add scoring to my traces"
- "Debug my Langfuse setup"
Interactive Workflow
Follow these steps in order. Do not skip the exploration phase - understanding the user's actual code is critical.
Step 1: Validate Environment
First, check if the user's environment is properly configured.
Run the setup validator:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/helpers/setup_validator.py check
If setup is incomplete:
- Langfuse SDK not installed → Suggest:
pip install langfuse - Missing API keys → Ask user for their Langfuse project keys
- Wrong host → Check if they're using self-hosted Langfuse
Once environment is ready, proceed to Step 2.
Step 2: Explore the User's Pipeline
This is the most important step. You must understand the user's code structure before recommending instrumentation.
Ask the user:
"Where is your main pipeline or agent code? Point me to the entry file or function."
Then read the relevant files and identify:
- Entry point: Where does a request come in?
- LLM calls: What client are they using? (OpenAI, Anthropic, etc.)
- Tool/API calls: Any external services, databases, or tools?
- Multi-step logic: Loops, chains, conditional flows?
- Existing instrumentation: Any Langfuse code already present?
Document what you find before proceeding.
Step 3: Read Core References
Always read the tracing model reference first:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/references/tracing-model.md
Then read the anti-patterns reference:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/references/anti-patterns.md
These establish the foundation for correct instrumentation.
Step 4: Classify Pipeline Type
Based on your exploration, classify the pipeline:
| Type | Characteristics | Template |
|---|---|---|
| Simple | Single LLM call, minimal preprocessing | basic-pipeline.py |
| RAG | Embedding + retrieval + generation | rag-pipeline.py |
| Agentic | LLM with tool loop, autonomous decisions | agentic-pipeline.py |
| Multi-model | Chain of LLM calls (summarize→translate) | multi-model-pipeline.py |
| Hybrid | Combination of patterns | Combine templates |
Step 5: Read Relevant References
Based on the pipeline type, read additional references:
For pipelines with LLM calls (all types):
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/references/llm-instrumentation.md
For pipelines with tool calls:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/references/tool-instrumentation.md
For agent workflows:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/references/agent-instrumentation.md
If user asks about decorators vs context managers:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/references/decorator-vs-manual.md
Step 6: Select and Adapt Template
Read the most appropriate template:
Simple pipeline:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/templates/basic-pipeline.py
RAG pipeline:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/templates/rag-pipeline.py
Agentic pipeline:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/templates/agentic-pipeline.py
Multi-model pipeline:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/templates/multi-model-pipeline.py
Step 7: Generate Custom Instrumentation
Now adapt the template to the user's specific code:
-
Map their functions to observation types:
- LLM call functions →
generation - Tool/API functions →
tool - Preprocessing/postprocessing →
span - Important events →
event
- LLM call functions →
-
Show exactly where to add instrumentation:
- Point to specific lines in their code
- Provide before/after examples
- Highlight what should be captured (input, output, metadata)
-
Remind them of anti-patterns to avoid:
- One trace per logical request (not per function)
- Always use
as_type="generation"for LLM calls - Don't forget to flush in short-lived processes
Step 8: Optional - Add Scoring
Only if the user requests automated scoring, read the scoring module:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/templates/scoring-module.py
Scoring options:
- Automatic metrics: Latency, token count, cost
- LLM-as-judge: Quality, safety, relevance evaluation
- Categorical scores: Intent classification
- Boolean scores: PII detection, other checks
Step 9: Verify Setup
After instrumentation is added, suggest testing:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/helpers/setup_validator.py test-trace
Or have them run their instrumented code and check the Langfuse dashboard.
Key Principles
One Trace Per Request
The most common mistake is creating multiple traces when there should be one. A trace represents one logical unit of work:
- One user request
- One pipeline execution
- One agent task
Everything else goes inside the trace as observations.
Observation Type Selection
| Type | Use For | Example |
|---|---|---|
generation | LLM API calls | OpenAI completion, Anthropic message |
span | Non-LLM work | Preprocessing, validation |
tool | Tool executions | API calls, database queries |
event | Point-in-time | Errors, important state changes |
Context Propagation
When using context managers, observations automatically nest correctly:
with langfuse.start_as_current_observation(name="pipeline") as trace:
with langfuse.start_as_current_observation(as_type="generation") as gen:
# gen is automatically a child of trace
pass
Required Data for Generations
Always capture for LLM calls:
model: Which model was usedinput: The messages/prompt sentoutput: The response receivedusage_details: Token counts (input/output)
Quick Reference Commands
Check environment:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/helpers/setup_validator.py check
Test connection:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/instrumentation-setup/helpers/setup_validator.py test-trace
Template Summary
| Template | Pattern | Best For |
|---|---|---|
basic-pipeline.py | Input → LLM → Output | Simple chatbots, Q&A |
rag-pipeline.py | Query → Retrieve → Generate | Document Q&A, search |
agentic-pipeline.py | Think → Act → Observe loop | Tool-using agents |
multi-model-pipeline.py | LLM1 → LLM2 → ... | Translation, refinement |
scoring-module.py | Add scores to traces | Quality monitoring |
Reference Summary
| Reference | Content |
|---|---|
tracing-model.md | Core concepts: trace vs span vs generation |
llm-instrumentation.md | How to trace LLM calls correctly |
tool-instrumentation.md | How to trace tool/API calls |
agent-instrumentation.md | Multi-step agent patterns |
decorator-vs-manual.md | When to use each approach |
anti-patterns.md | Common mistakes and how to avoid |