docker-model-runner
Skills for using Docker Model Runner to run local LLM inference
SKILL.md
| Name | docker-model-runner |
| Description | Skills for using Docker Model Runner to run local LLM inference |
name: docker-model-runner description: Skills for using Docker Model Runner to run local LLM inference
Docker Model Runner
Docker Model Runner (DMR) makes it easy to run AI models locally using Docker. This skill helps you effectively use Docker Model Runner for local LLM inference in your development workflow.
Workflow
When helping users with local LLM inference using Docker Model Runner:
-
Check if Docker Model Runner is available by running
docker model version -
List available models with
docker model listto see what's already pulled -
Search for models on Docker Hub or HuggingFace:
docker model search <query>to find models- Popular models include:
ai/gemma3,ai/llama3.2,ai/smollm2,ai/qwen3
-
Pull models before running:
docker model pull <model> -
Run models for inference:
- One-time prompt:
docker model run ai/smollm2 "Your prompt here" - Interactive chat:
docker model run ai/smollm2 - Pre-load model:
docker model run --detach ai/smollm2
- One-time prompt:
-
Use the OpenAI-compatible API for programmatic access:
- Endpoint:
http://localhost:12434/engines/llama.cpp/v1/chat/completions - This is compatible with OpenAI client libraries
- Endpoint:
API Usage
Docker Model Runner exposes an OpenAI-compatible REST API:
# Chat completions
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
}'
For Python with the OpenAI library:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:12434/engines/llama.cpp/v1",
api_key="not-needed" # API key not required for local inference
)
response = client.chat.completions.create(
model="ai/smollm2",
messages=[{"role": "user", "content": "Hello!"}]
)
Key Commands
| Command | Description |
|---|---|
docker model run <model> [prompt] | Run a model with optional prompt |
docker model pull <model> | Pull a model from registry |
docker model list | List downloaded models |
docker model search <query> | Search for models |
docker model ps | Show running models |
docker model rm <model> | Remove a model |
docker model inspect <model> | Show model details |
Best Practices
- Use smaller models (like
ai/smollm2) for faster responses during development - Pre-load models with
--detachfor better performance in scripts - Models stay loaded until another model is requested or timeout (5 min)
- Use the OpenAI-compatible API for integration with existing tools
References
See references/docker-model-guide.md for detailed documentation.