machina-docker
Docker development environment for machina-meta workspace. Use for container management, development stacks, database services, health checks, volume management, and infrastructure. The single authoritative source for all Docker operations.
SKILL.md
| Name | machina-docker |
| Description | Docker development environment for machina-meta workspace. Use for container management, development stacks, database services, health checks, volume management, and infrastructure. The single authoritative source for all Docker operations. |
name: machina-docker description: Docker development environment for machina-meta workspace. Use for container management, development stacks, database services, health checks, volume management, and infrastructure. The single authoritative source for all Docker operations.
Docker Skill
Comprehensive Docker environment management for the machina-meta workspace. This is the single authoritative source for all Docker operations.
When to Use This Skill
Development Environment Management
- Starting/stopping the development stack
- Checking service status and health
- Managing database containers (PostgreSQL, Neo4j, Redis, Qdrant)
- Running full-stack development (frontend + backend + databases)
Service Operations
- Building Docker images
- Running specific service containers
- Debugging container issues
- Log extraction and analysis
Database Operations
- Managing development databases
- Snapshot/restore operations (Qdrant)
- Volume management and data persistence
- Database migrations in containers
Specialized Stacks
- Langfuse observability (local and production)
- Local LLM inference (Ollama, vLLM with GPU)
- Remote debugging tools (pgAdmin, Neo4j Browser)
- gcloud-admin DevOps container
Infrastructure & Deployment
- Gateway infrastructure (ngrok, nginx)
- Monitoring stacks (Langfuse + OTEL)
- LLM experiment environments
Kubernetes Environment Configuration
- Adding new environment variables to k8s deployments
- Integrating
.env.examplechanges into Kustomize manifests - Managing secrets via ExternalSecrets + Google Secret Manager
- Per-environment configuration via overlay patches
Quick Reference
Primary Development Commands
All Docker commands should be run from the machina-meta workspace root.
Note: Justfile rules in
repos/dem2,repos/dem2-webui, andrepos/medical-catalogare deprecated. Use workspace-level commands instead.
Workspace Level (machina-meta root):
| Command | Purpose | Underlying Operation |
|---|---|---|
just dev-up | Start full stack (all services) | ./scripts/dev_stack.py up |
just dev-down | Stop all services | ./scripts/dev_stack.py down |
just dev-status | Check all service health | ./scripts/dev_stack.py status |
just dev-restart | Rebuild and restart | docker compose --profile dev up -d --build |
just dev-check | Run sanity check tests | Non-destructive verification suite |
gcloud-admin DevOps:
| Command | Purpose | Underlying Operation |
|---|---|---|
just gcloud-admin::shell | Interactive DevOps shell | docker compose run --rm gcloud-admin |
just gcloud-admin::kubectl <args> | Run kubectl | docker compose run --rm gcloud-admin kubectl <args> |
just gcloud-admin::helm <args> | Run helm | docker compose run --rm gcloud-admin helm <args> |
just gcloud-admin::k9s | Cluster TUI | docker compose run --rm gcloud-admin k9s |
just gcloud-admin::argocd <args> | Run ArgoCD CLI | docker compose run --rm gcloud-admin argocd <args> |
just gcloud-admin::preview-info <identifier> | Get preview deployment info | Shows tags, commits, PR status, ArgoCD health |
Preview Environment Info:
Check the current state of a preview deployment using any identifier:
# By GKE namespace
just gcloud-admin::preview-info --gke-namespace tusdi-preview-92
# By git tag
just gcloud-admin::preview-info --git-tag preview-dbeal-docproc-dev
# By ArgoCD app name
just gcloud-admin::preview-info --argocd-app preview-pr-92
# By infra branch
just gcloud-admin::preview-info --infra-branch preview/dbeal-docproc-dev
# By infra PR number
just gcloud-admin::preview-info --pr 92
# By git branch
just gcloud-admin::preview-info --git-branch feature/dbeal-docproc-dev
# Output formats: terminal, json, markdown (default: markdown)
just gcloud-admin::preview-info --gke-namespace tusdi-preview-92 --format json
Output includes:
- Preview ID resolved from identifier
- Backend/Frontend tag commits and dates
- Infrastructure branch and PR status
- ArgoCD sync and health status
- GitHub workflow status for all repos
Updating Preview Deployments (Deploy a Branch Commit to Preview):
This is the complete, authoritative process for deploying code changes to a GKE preview environment. Use this whenever the user says "deploy to preview-XX".
How the Preview CI/CD Pipeline Works
The deployment pipeline is fully automated once a preview tag is pushed:
1. Developer pushes tag `preview-<branch>` to dem2 (or dem2-webui)
↓
2. GitHub Actions `pr-preview-build.yml` triggers on `push: tags: preview-*`
↓
3. CI builds Docker image, pushes to Artifact Registry with tag:
`<tag-name>-<full-commit-sha>`
Example: `preview-alen-dev-1-ae99b3911e7a564c1470a116e3097792342f665b`
↓
4. CI dispatches `preview-update` event to dem2-infra repository
↓
5. dem2-infra workflow updates `kustomization.yaml` with new image tag
↓
6. ArgoCD detects the infra change and syncs the deployment
↓
7. New pod starts with the updated image in tusdi-preview-XX namespace
Image tag format: <git-tag-name>-<full-40-char-commit-sha>
- For tag pushes:
preview-alen-dev-1-ae99b3911e7a564c1470a116e3097792342f665b - For PR builds:
pr-92-ae99b3911e7a564c1470a116e3097792342f665b
Step-by-Step: Deploy Backend (tusdi-api) to Preview
# Step 1: Ensure your code is committed and pushed to the branch
# (Use machina-git skill for this)
cd repos/dem2 && git log -1 --oneline
# Verify HEAD is the commit you want to deploy
# Step 2: Tag current HEAD with preview-<branch-name> and force-push
# This triggers the CI pipeline automatically
cd repos/dem2 && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force
# Example for alen-dev-1 branch:
cd repos/dem2 && git tag -f preview-alen-dev-1 && git push origin preview-alen-dev-1 --force
Shortcut using justfile:
# Equivalent to the above (tags + force-pushes in one command)
(cd repos/dem2 && just preview <branch-name>)
# Example:
(cd repos/dem2 && just preview alen-dev-1)
Step-by-Step: Deploy Frontend (tusdi-webui) to Preview
# Same pattern as backend
cd repos/dem2-webui && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force
# Or using justfile:
(cd repos/dem2-webui && just preview <branch-name>)
Step-by-Step: Deploy Both Backend and Frontend
# Deploy both at once
(cd repos/dem2 && just preview-both <branch-name>)
# Or manually:
cd repos/dem2 && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force
cd repos/dem2-webui && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force
Step 3: Monitor Deployment Progress
# Check CI workflow status (wait for image build to complete)
cd repos/dem2 && gh run list --limit 3
# Watch a specific run
cd repos/dem2 && gh run watch <run-id>
# Check ArgoCD sync status
just gcloud-admin::preview-info --gke-namespace tusdi-preview-<XX>
# Watch pods for rollout
kubectl get pods -n tusdi-preview-<XX> -w
# Check pod logs after deployment
kubectl logs -n tusdi-preview-<XX> -l app.kubernetes.io/component=api --tail=50
Common Deployment Scenarios
| Scenario | Command |
|---|---|
| Deploy backend only | cd repos/dem2 && git tag -f preview-<branch> && git push origin preview-<branch> --force |
| Deploy frontend only | cd repos/dem2-webui && git tag -f preview-<branch> && git push origin preview-<branch> --force |
| Deploy both | (cd repos/dem2 && just preview-both <branch>) |
| Check build status | cd repos/dem2 && gh run list --limit 3 |
| Check deployment health | kubectl get pods -n tusdi-preview-<XX> |
| Force ArgoCD sync | just gcloud-admin::argocd app sync argocd/preview-pr-<XX> |
Troubleshooting Deployments
| Issue | Solution |
|---|---|
| CI not triggered | Verify tag was pushed: cd repos/dem2 && git ls-remote --tags origin | grep preview-<branch> |
| Image not found | Check CI run logs: cd repos/dem2 && gh run list --limit 3 |
| ArgoCD not syncing | Manual sync: just gcloud-admin::argocd app sync argocd/preview-pr-<XX> |
| Pod CrashLoopBackOff | Check logs: kubectl logs -n tusdi-preview-<XX> -l app.kubernetes.io/component=api --tail=100 |
| Old image still running | Delete deployment to force recreation: kubectl delete deployment tusdi-api -n tusdi-preview-<XX> then ArgoCD sync |
Important Notes
- Only backend changes need backend deployment. If you only changed dem2, you only need to tag/push dem2.
- The tag must match the branch name pattern used when the preview was originally created.
- Force-push (
--force) is expected for preview tags — they are mutable pointers. - CI build takes ~3-5 minutes. ArgoCD sync adds ~1-2 minutes after that.
- The kustomization.yaml in dem2-infra is updated automatically by the CI dispatch — you do NOT need to manually edit it.
Syncing Local dem2-infra After CI Dispatch
When the CI pipeline updates kustomization.yaml in dem2-infra, it force-pushes to the preview/<branch> branch on the remote. This means your local preview/<branch> branch diverges from the remote.
You MUST sync your local branch before making any further infra changes:
# After CI build completes and dispatches to dem2-infra:
cd repos/dem2-infra && git pull --rebase origin preview/<branch-name>
# Example for alen-dev-1:
cd repos/dem2-infra && git pull --rebase origin preview/alen-dev-1
Why --rebase is required:
- The CI force-pushes a new commit to the remote branch with the updated image tag
- A regular
git pull(merge) would create unnecessary merge commits git pull --rebasereplays any local commits on top of the CI's update- If your only local commit was a previous image tag update, it will be skipped automatically (no conflict)
Common scenarios:
| Scenario | What Happens |
|---|---|
| No local changes to infra | git pull --rebase fast-forwards cleanly |
| Local infra changes (env vars, patches) | Local commits are rebased on top of CI's image tag update |
| Local image tag edit conflicts with CI | Rebase conflict — resolve by accepting the CI version (theirs), since the CI tag is correct |
Do NOT manually edit kustomization.yaml image tags — always let the CI handle it via the tag-push workflow. If you need to verify the current image tag after pull:
cd repos/dem2-infra && grep newTag k8s/overlays/preview/kustomization.yaml
Deprecated Commands (in child repos)
The following patterns are deprecated and should be migrated to workspace-level commands:
| Deprecated Command | Replacement |
|---|---|
(cd repos/dem2 && just dev-env-up) | just dev-up (starts full stack) |
(cd repos/dem2 && just dev-env-down) | just dev-down |
(cd repos/dem2 && just med-api-up) | just dev-up |
(cd repos/medical-catalog && just dev-env-up) | just dev-up |
(cd repos/medical-catalog && just docker-build) | (pending migration) |
Docker Compose Files Map
| File | Location | Purpose | Profile |
|---|---|---|---|
docker-compose.yaml | Root | Main workspace stack (primary) | dev |
docker-compose.yaml | repos/dem2/infrastructure/ | Backend dev environment | dev, test |
docker-compose.langfuse.local.yaml | repos/dem2/infrastructure/ | Local Langfuse observability | - |
docker-compose.yml | repos/dem2/infrastructure/remote-debug/ | pgAdmin, Neo4j debug | - |
docker-compose.qdrant.yaml | repos/dem2/services/indicators-catalog/ | Standalone Qdrant | - |
docker-compose.yaml | repos/medical-catalog/infra/ | Catalog Qdrant (port 16333) | dev |
docker-compose.yaml | gcloud-admin/ | DevOps admin container | - |
docker-compose.ngrok-nginx.yaml | repos/dem2-infra/.../gateway/ | Public gateway tunnel | - |
docker-compose.langfuse.yaml | repos/dem2-infra/.../monitoring/ | Production Langfuse + OTEL | monitoring |
docker-compose.ollama.yml | repos/dem2-infra/.../experiment-setup/ | Local LLM with GPU | - |
docker-compose.vllm.yml | repos/dem2-infra/.../experiment-setup/ | vLLM inference server | - |
See references/compose-files.md for detailed documentation of each file.
Profile System
All services use the unified dev profile:
| Profile | Services | Use Case |
|---|---|---|
dev | All services (databases + backend + frontend + catalog) | Full stack development |
Starting Services:
# Full stack (all services)
docker compose --profile dev up -d
# Or use the just command (recommended)
just dev-up
Note: The --profile dev flag is required for up, down, restart, and build operations. It is NOT required for querying running containers (ps, logs, stats, exec).
See references/profiles.md for detailed profile documentation.
Port Mappings
Development Ports (localhost)
| Port | Service | Stack | Health Check |
|---|---|---|---|
| 3000 | Frontend (Next.js) | machina-meta | http://localhost:3000 |
| 5432 | PostgreSQL | machina-meta | pg_isready -h localhost |
| 5540 | RedisInsight | machina-meta | http://localhost:5540 |
| 6333 | Qdrant REST API | machina-meta | http://localhost:6333/healthz |
| 6334 | Qdrant Web UI | machina-meta | http://localhost:6334/dashboard |
| 6379 | Redis | machina-meta | redis-cli ping |
| 7474 | Neo4j Browser | machina-meta | http://localhost:7474 |
| 7687 | Neo4j Bolt | machina-meta | (used by applications) |
| 8000 | Backend API | machina-meta | http://localhost:8000/docs |
| 8001 | Medical Catalog | machina-meta | http://localhost:8001/health |
Langfuse Stack Ports
| Port | Service | Purpose |
|---|---|---|
| 3003 | Langfuse Web | Observability UI |
| 3030 | Langfuse Worker | Background processing |
| 5433 | Langfuse PostgreSQL | Langfuse database |
| 6380 | Langfuse Redis | Langfuse cache |
| 8123 | ClickHouse HTTP | Analytics queries |
| 9090 | MinIO API | S3-compatible storage |
| 9091 | MinIO Console | MinIO admin UI |
Specialized Ports
| Port | Service | Purpose |
|---|---|---|
| 5050 | pgAdmin | PostgreSQL admin UI (remote-debug) |
| 11434 | Ollama | Local LLM inference |
| 16333 | Qdrant (catalog) | Medical catalog vector store |
| 17474 | Neo4j (remote) | Remote debugging Neo4j |
See references/ports.md for complete port documentation.
Common Usage Patterns
Pattern 1: Full Stack Development (Recommended)
Start everything from workspace root:
# Start full stack
just dev-up
# Check status
just dev-status
# When done
just dev-down
Pattern 2: Backend Development Only
Start databases, run backend locally:
# Start full stack (includes databases)
just dev-up
# Stop the containerized backend to run locally
docker stop machina-meta-backend-1
# Run backend locally
(cd repos/dem2 && just run)
# When done
just dev-down
Pattern 3: Frontend Development Only
# Start full stack
just dev-up
# Or just start frontend locally if backend is running
(cd repos/dem2-webui && pnpm dev)
Pattern 4: Clean Restart with Fresh Data
Reset databases and reload fixtures:
# Stop and remove volumes
just dev-down
docker volume prune -f # Warning: removes all unused volumes
# Start fresh
just dev-up
Pattern 5: Local LLM Inference (Ollama)
# Start Ollama with GPU
(cd repos/dem2-infra/infrastructure/docker/google_cloud/experiment-setup && \
docker compose -f docker-compose.ollama.yml up -d)
# Pull a model
docker exec -it ollama ollama pull llama3.2
# Test
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "prompt": "Hello"}'
Pattern 6: Local Langfuse Observability
# Start Langfuse stack
(cd repos/dem2/infrastructure && docker compose -f docker-compose.langfuse.local.yaml up -d)
# Access UI at http://localhost:3003
# Stop
(cd repos/dem2/infrastructure && docker compose -f docker-compose.langfuse.local.yaml down)
Pattern 7: Remote Debugging Tools
# Start pgAdmin and Neo4j browser
(cd repos/dem2/infrastructure/remote-debug && docker compose up -d)
# pgAdmin: http://localhost:5050
# Neo4j: http://localhost:17474
Pattern 8: gcloud-admin DevOps
# First-time setup
just gcloud-admin::setup-nonprod
# Interactive shell
just gcloud-admin::shell
# Run kubectl commands
just gcloud-admin::kubectl get pods -n argocd
# Interactive cluster UI
just gcloud-admin::k9s
Pattern 9: Remote GKE Environment Access
This is the unified workflow for accessing remote GKE environments (preview, dev, staging).
CRITICAL: Local Docker vs Remote GKE — Mutually Exclusive
Port-forwarded remote services bind to the SAME ports as local Docker containers (5432, 7474, 8000, etc.). You MUST choose one:
| Mode | Description | Port Forwarding Required |
|---|---|---|
| Local Docker | just dev-up running | NO — services already on localhost |
| Remote GKE | Point to a preview/dev/staging namespace | YES — must just dev-down first |
Step 1: Determine current mode
# Check if local Docker containers are running
just dev-status
- If local containers are running and you want local dev: no port forwarding needed, skip to Step 5 for API usage.
- If you need remote GKE access: proceed to Step 2.
Step 2: Stop local Docker (required for remote access)
# Stop all local containers to free ports
just dev-down
Step 3: Import environment from Kubernetes
Extract all environment variables (configs, ConfigMaps, Secrets) from a remote deployment and generate a local .env file with K8s hostnames rewritten to localhost for port-forwarded access.
# Import environment from a k8s deployment
./scripts/import_k8s_environment.py import -n <NAMESPACE> -d <DEPLOYMENT>
# Examples:
./scripts/import_k8s_environment.py import -n tusdi-preview-99 -d tusdi-api
./scripts/import_k8s_environment.py import -n tusdi-staging -d tusdi-api
./scripts/import_k8s_environment.py import -n tusdi-dev -d tusdi-api
Output: Creates .env.<NAMESPACE>.<DEPLOYMENT> (e.g., .env.tusdi-preview-99.tusdi-api) with:
- All direct values, ConfigMap values, and Secret values resolved
- K8s service hostnames (
postgres,neo4j,redis, etc.) rewritten tolocalhost - URL variables (
redis://redis:6379,bolt://neo4j:7687) rewritten to localhost equivalents - File permissions set to 600 (owner read/write only)
Switch active environment:
.current_env is a symlink that points to the active environment file. It acts as a context pointer — switching it changes which GKE environment all tools (curl_api, neo4j-query.py, etc.) operate against.
# Point .current_env to the imported environment (overwrites any previous symlink)
ln -sfn .env.tusdi-preview-99.tusdi-api .current_env
If the .env.<NAMESPACE>.<DEPLOYMENT> file doesn't exist yet (or is out of date), import it first using Step 3 above, then create the symlink.
Switching between environments:
# Switch to preview-99
ln -sfn .env.tusdi-preview-99.tusdi-api .current_env
# Switch to staging
ln -sfn .env.tusdi-staging.tusdi-api .current_env
# Verify which environment is active
ls -la .current_env
Compare environments (detect drift after re-import):
./scripts/import_k8s_environment.py compare .env.old .env.new
./scripts/import_k8s_environment.py compare .env.old .env.new --markdown
./scripts/import_k8s_environment.py compare .env.old .env.new --json
Additional import options:
| Flag | Purpose |
|---|---|
-c <name> | Select specific container (default: first) |
-o <path> | Custom output path |
--no-comments | Omit source comments |
--no-localhost | Disable hostname rewriting |
--json | Output as JSON instead of .env file |
-q | Quiet mode |
Step 4: Port-forward all services
ALWAYS use port_forward_service.py — NEVER run individual kubectl port-forward commands.
Launch the port-forward script in a dedicated tmux pane so its output stays visible without cluttering the main terminal:
# Forward ALL services in a namespace (recommended)
# Creates a small (~10 line) pane at the top, focus stays on current pane
# Capture the pane ID for later use (e.g., to kill it)
PF_PANE=$(tmux split-window -d -b -v -l 10 -P -F '#{pane_id}' \
"./scripts/port_forward_service.py '{\"port_forward\": [{\"namespace\": \"<NAMESPACE>\", \"service_name\": \".*\"}]}'")
# Example: forward all tusdi-preview-99 services
PF_PANE=$(tmux split-window -d -b -v -l 10 -P -F '#{pane_id}' \
"./scripts/port_forward_service.py '{\"port_forward\": [{\"namespace\": \"tusdi-preview-99\", \"service_name\": \".*\"}]}'")
# Forward specific services only
PF_PANE=$(tmux split-window -d -b -v -l 10 -P -F '#{pane_id}' \
"./scripts/port_forward_service.py '{\"port_forward\": [{\"namespace\": \"tusdi-preview-99\", \"service_name\": \"neo4j\"}, {\"namespace\": \"tusdi-preview-99\", \"service_name\": \"postgres\"}]}'")
tmux flags: -d keeps focus on the current pane, -b places the new pane above (top), -v splits vertically, -l 10 sets height to 10 lines, -P -F '#{pane_id}' prints the new pane ID (captured into $PF_PANE).
The <NAMESPACE> should match the namespace shown in the header of .current_env.
Script details:
service_namesupports regex patterns (".*"matches all services in the namespace)- Multi-port services (e.g., Neo4j: 7474 + 7687) forward all ports automatically
- Automatic restart when port-forwards drop
- Uses K8s
targetPortas the local port (standard ports: 5432, 7474, 8000, etc.) - View JSON schema:
./scripts/port_forward_service.py --schema
State file: While running, the script writes /var/run/user/<uid>/port_forward_service.json containing the PID and all forwarded services. The file is automatically removed on exit (including tmux kill-pane). Read it to check which services are currently forwarded:
cat /var/run/user/$(id -u)/port_forward_service.json | jq
Stopping port-forwards:
# Kill the port-forward pane using the captured pane ID
tmux kill-pane -t "$PF_PANE"
# Or kill the process directly (from any pane)
pkill -f port_forward_service
Step 5: Use the environment
# Source .current_env and call backend API
(. .current_env && cd repos/dem2 && just curl_api '{"function": "list_documents"}') | jq
# Query observations
(. .current_env && cd repos/dem2 && just curl_api '{"function": "get_observations_grouped", "per_type_values_limit": 3}') | jq
# Filter specific biomarker
(. .current_env && cd repos/dem2 && just curl_api '{"function": "get_observations_grouped", "per_type_values_limit": 3}') | jq '.items[]|select(.observation_type.display_name=="Folate").values'
# Query remote Neo4j directly (credentials come from .current_env)
(. .current_env && ./scripts/neo4j-query.py --format json "MATCH (n:PatientNode) RETURN n.first_name, n.last_name LIMIT 5")
Authentication scripts (called automatically by curl_api):
user_manager.py— Generates JWT tokens for API authenticationuv run scripts/user_manager.py user token --export— outputsexport AUTH_HEADER="Bearer ..."uv run scripts/user_manager.py user token --export-cookie— for UI/browser authentication
curl_api.sh— JSON dispatch system; handles JWT tokens, patient context headers, respectsBACKEND_URL
Common namespaces:
| Namespace | Environment |
|---|---|
tusdi-dev | Development |
tusdi-staging | Staging |
tusdi-preview-<id> | Preview (e.g., tusdi-preview-99) |
Note: Always use kubectl directly on the host machine for port-forwarding. The gcloud-admin container has networking limitations that prevent port-forwarded ports from being accessible on the host.
Pattern 10: ArgoCD Application Sync and Management
For syncing, monitoring, and troubleshooting ArgoCD deployments on GKE preview environments.
Prerequisites:
- gcloud-admin container setup completed (
just gcloud-admin::setup-nonprod) - ArgoCD authentication (
just gcloud-admin::auth-argocd-login)
Key Concepts:
- ArgoCD app name: You MUST list apps first to find the correct name — do NOT guess app names
- Kubernetes namespace: Format is
tusdi-<env>(e.g.,tusdi-preview-92,tusdi-staging) - App names and namespace names are DIFFERENT and not always predictable
Check ArgoCD Authentication Status:
# Verify you're logged in
just gcloud-admin::auth-argocd-status
# If not logged in, authenticate (interactive SSO - requires browser)
# Linux:
just gcloud-admin::auth-argocd-login
# macOS (--network host doesn't work, use import instead):
# 1. argocd login <server> --sso --grpc-web (on host)
# 2. just gcloud-admin::auth-argocd-import
List ArgoCD Applications (ALWAYS do this first):
# IMPORTANT: Always list apps first to find the correct app name
# Do NOT guess app names - they may not follow predictable patterns
just gcloud-admin::argocd app list
# Filter preview apps
just gcloud-admin::argocd app list | grep preview
Check Application Status:
# Get app summary (use the EXACT app name from 'app list' output above)
just gcloud-admin::argocd app get <app-name-from-list>
# Get status as JSON for scripting
just gcloud-admin::argocd app get <app-name-from-list> --output json | jq '{
sync_status: .status.sync.status,
health_status: .status.health.status,
operation_state: .status.operationState.phase
}'
Sync Application (Trigger Deployment):
# Basic sync - applies manifests from git
just gcloud-admin::argocd app sync <app-name-from-list>
# Force sync - override any drift
just gcloud-admin::argocd app sync <app-name-from-list> --force
# Sync with prune - remove resources not in git
just gcloud-admin::argocd app sync <app-name-from-list> --prune
Handle Stuck Operations:
If sync fails with "another operation is already in progress":
# Check current operation state
just gcloud-admin::argocd app get <app-name-from-list> --output json | jq '.status.operationState'
# Terminate stuck operation
just gcloud-admin::argocd app terminate-op <app-name-from-list>
# Then retry sync
just gcloud-admin::argocd app sync <app-name-from-list>
Monitor Deployment Progress:
# Watch pods in the namespace
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -w
# Check specific deployment
just gcloud-admin::kubectl get pods -n tusdi-preview-92 | grep tusdi-api
# Get pod logs
just gcloud-admin::kubectl logs -n tusdi-preview-92 <pod-name> --tail=100
# Get recent events
just gcloud-admin::kubectl get events -n tusdi-preview-92 --sort-by='.lastTimestamp' | tail -20
Troubleshooting Unhealthy Deployments:
# Check deployment status
just gcloud-admin::kubectl describe deployment tusdi-api -n tusdi-preview-92
# Check pod status and restart count
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -o wide
# Check pod events for errors
just gcloud-admin::kubectl describe pod <pod-name> -n tusdi-preview-92
# Check pod logs for startup errors (like ImportError)
just gcloud-admin::kubectl logs -n tusdi-preview-92 <pod-name> --tail=200
Force Redeploy (Delete and Let ArgoCD Recreate):
If a deployment is corrupted or stuck:
# Delete the deployment (ArgoCD will recreate it)
just gcloud-admin::kubectl delete deployment tusdi-api -n tusdi-preview-92
# Sync to trigger recreation
just gcloud-admin::argocd app sync <app-name-from-list>
# Monitor recreation
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -w
Common Scenarios:
| Problem | Solution |
|---|---|
| ImportError on startup | Push code fix, ArgoCD auto-syncs, or manual sync |
| Deployment deleted accidentally | argocd app sync recreates from manifests |
| Sync stuck "in progress" | argocd app terminate-op then retry |
| Pod CrashLoopBackOff | Check logs, fix code/config, push, sync |
| Image pull error | Verify image exists in registry, check secrets |
Workflow: Fix Code Error on Preview:
- Identify error (e.g., ImportError in pod logs)
- Fix locally and commit the fix
- Push to branch that the preview tracks
- Wait for CI/CD to build new image (GitHub Actions)
- ArgoCD auto-syncs or manually trigger:
just gcloud-admin::argocd app sync <app-name-from-list> - Monitor pod until healthy:
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -w - Verify fix by checking pod logs:
just gcloud-admin::kubectl logs -n tusdi-preview-92 <pod-name> --tail=50
Volume Management
Named Volumes
| Volume | Service | Purpose |
|---|---|---|
postgres_data | PostgreSQL | User data, sessions |
neo4j_data | Neo4j | Graph data |
neo4j_logs | Neo4j | Database logs |
redis_data | Redis | Cache persistence |
qdrant_storage | Qdrant | Vector embeddings |
redisinsight_data | RedisInsight | UI settings |
Qdrant Snapshot Restore
The dev_stack.py script automatically restores Qdrant snapshots on first startup:
# Manual restore if needed
(cd repos/medical-catalog && just snapshot-restore-all)
# Check Qdrant collections
curl http://localhost:6333/collections
Clearing Volumes
# Stop services and remove volumes
docker compose down -v
# Remove specific volume
docker volume rm machina-meta_postgres_data
# Remove all unused volumes (DANGEROUS)
docker volume prune -f
See references/volumes.md for detailed volume documentation.
Health Checks
dev_stack.py Orchestration
The scripts/dev_stack.py script provides intelligent health monitoring:
- 90-second timeout for service startup
- HTTP endpoint validation for web services
- Container health status from Docker
- Automatic error analysis from logs
- Qdrant snapshot detection and restore
Service Health Endpoints
| Service | Health Endpoint |
|---|---|
| Backend | GET /api/v1/health |
| Medical Catalog | GET /health |
| Qdrant | GET /healthz |
| Neo4j | GET :7474 (browser) |
| PostgreSQL | pg_isready |
| Redis | redis-cli ping |
Sanity Check Suite
Run the comprehensive sanity check before committing Docker changes:
just dev-check
This runs 6 non-destructive verification tests:
- Service Status -
./scripts/dev_stack.py status - Container Status -
docker compose ps - Health Checks - Container health inspection
- Resource Usage -
docker statssnapshot - Volume Status - List machina volumes
- Endpoint Health - HTTP checks for backend, catalog, and Qdrant
Manual Health Checks
# Check all container status
docker compose ps
# Check specific service logs
docker compose logs backend --tail 50
# Check container health
docker inspect --format='{{json .State.Health}}' machina-meta-backend-1
Troubleshooting
Port Already in Use
# Find what's using the port
lsof -i :8000
# or
ss -tulpn | grep 8000
# Kill process or stop container
docker ps | grep 8000
docker stop <container-name>
Container Won't Start
# Check logs
docker compose logs <service> --tail 100
# Check health status
docker inspect --format='{{json .State.Health}}' <container-name>
# Check events
docker events --since 5m
Database Connection Failed
# Verify containers are healthy
just dev-status
# Check network connectivity
docker network ls
docker network inspect machina-meta_default
# Test connection from host
pg_isready -h localhost -p 5432
redis-cli -h localhost -p 6379 ping
Neo4j Won't Start
# Check Neo4j logs
docker compose logs neo4j --tail 100
# Common issue: memory limits
# Edit docker-compose.yaml NEO4J_dbms_memory_* settings
Qdrant Snapshots Not Restored
# Check if volume exists
docker volume ls | grep qdrant
# Manual restore
(cd repos/medical-catalog && just snapshot-restore-all)
# Verify collections
curl http://localhost:6333/collections
Log Extraction
# Extract backend logs
(cd repos/dem2 && ./scripts/extract_backend_logs.sh -s backend -l ERROR --since 10m)
# Follow all logs
docker compose logs -f
# Follow specific service
docker compose logs -f backend
See references/troubleshooting.md for comprehensive troubleshooting guide.
API Testing with curl_api
The curl_api justfile rule in repos/dem2 provides a convenient JSON-based interface for testing backend APIs without writing code.
Overview
Location: repos/dem2/justfile (rule: curl_api)
Backend Script: repos/dem2/scripts/curl_api.sh
Purpose: Call backend API functions using JSON dispatch for development and testing
How It Works
The curl_api rule uses a JSON dispatch system that:
- Accepts a JSON payload with a
functionfield and arguments - Routes the call to a registered bash function in
curl_api.sh - Handles authentication automatically (JWT tokens + patient context)
- Executes the API call and returns the result
Basic Usage
# From repos/dem2 directory
(cd repos/dem2 && just curl_api '{"function": "function_name", "arg1": "value1", ...}')
Authentication:
- Automatically handles JWT token generation via
user_manager.py - Sets patient context header (
X-Patient-Context-ID) - Default user:
dbeal@numberone.ai - Default patient: Stuart McClure, DOB: 1969-03-03
Available Function Categories
Document Management
List documents:
(cd repos/dem2 && just curl_api '{"function": "list_documents"}')
Upload a file:
(cd repos/dem2 && just curl_api '{"function": "upload_file", "path": "datasets/documents/test.pdf"}')
Process a specific document:
(cd repos/dem2 && just curl_api '{"function": "process_document", "file_id": "uuid-here"}')
Process all uploaded documents:
(cd repos/dem2 && just curl_api '{"function": "process_all_documents"}')
Delete all documents:
(cd repos/dem2 && just curl_api '{"function": "delete_all_documents"}')
Task Management
List all document processing tasks:
(cd repos/dem2 && just curl_api '{"function": "list_tasks"}')
Get specific task details:
(cd repos/dem2 && just curl_api '{"function": "get_task", "task_id": "uuid-here"}')
List failed tasks only:
(cd repos/dem2 && just curl_api '{"function": "list_failed_tasks"}')
Patient Management
List all patients:
(cd repos/dem2 && just curl_api '{"function": "list_patients"}')
Agent Session Management
Create a new agent session:
(cd repos/dem2 && just curl_api '{"function": "create_session", "name": "My Session"}')
Set/update session name:
(cd repos/dem2 && just curl_api '{"function": "set_session_name", "session_id": "uuid-here", "name": "New Name"}')
List all sessions:
(cd repos/dem2 && just curl_api '{"function": "list_sessions"}')
Agent Query
Query the medical agent:
(cd repos/dem2 && just curl_api '{"function": "query_agent", "query": "What is my cholesterol?"}')
Query with specific session:
(cd repos/dem2 && just curl_api '{"function": "query_agent", "query": "What is my cholesterol?", "session_id": "uuid-here"}')
Agent Diagnostics
Check agent dependencies:
(cd repos/dem2 && just curl_api '{"function": "check_agent_dependencies"}')
Validate agent configuration:
(cd repos/dem2 && just curl_api '{"function": "validate_agent_config"}')
Medical Catalog (Biomarker Enrichment)
Enrich biomarkers:
(cd repos/dem2 && just curl_api '{"function": "catalog_enrich", "names": ["ApoA-1", "Factor II"]}')
Check enrichment status:
(cd repos/dem2 && just curl_api '{"function": "catalog_enrich_status", "task_id": "uuid-here"}')
Search for biomarkers:
(cd repos/dem2 && just curl_api '{"function": "catalog_search", "names": ["cholesterol"], "limit": 5}')
Search by alias groups:
(cd repos/dem2 && just curl_api '{"function": "catalog_search_by_alias", "alias_groups": [["LDL"], ["HDL", "HDL-C"]]}')
Search derivative biomarkers (ratios, calculated values):
(cd repos/dem2 && just curl_api '{"function": "catalog_search_derivatives", "names": ["ApoB/ApoA-1"]}')
Enrich derivatives (ratios, sums, percentages):
(cd repos/dem2 && just curl_api '{"function": "catalog_enrich_derivatives", "names": ["ApoB/ApoA-1", "TC/HDL-C"]}')
List all derivatives (with pagination):
(cd repos/dem2 && just curl_api '{"function": "catalog_list_derivatives", "limit": 100, "offset": 0}')
Debug
Debug JSON argument structure:
(cd repos/dem2 && just curl_api '{"function": "debug_args", "names": ["test1", "test2"]}')
Common Workflows
List Available Test Documents
Before uploading documents for testing, list available test documents in the repository:
# List all test documents with full paths
just list-test-docs
# Output example:
# [
# "pdf_tests/medical_records/.../Boston Heart July 2021.pdf",
# "pdf_tests/medical_records/.../Dutch cortisol 9-01-25.pdf",
# ...
# ]
Use this to:
- Find available test documents before testing document processing
- Get correct paths for upload functions
- Identify specific documents for debugging (e.g., Dutch cortisol document for "Estrone (E1)" testing)
Upload and Process a Document
# First, list available test documents
just list-test-docs
# Upload document using path from list-test-docs
(cd repos/dem2 && just curl_api '{"function": "upload_file", "path": "pdf_tests/medical_records/Stuart Mcclure Medical Records (PRIVATE)/Dutch cortisol 9-01-25.pdf"}')
# Process the uploaded document (use the file_id from upload response)
(cd repos/dem2 && just curl_api '{"function": "process_document", "file_id": "file-id-from-upload"}')
Query Agent About Health Markers
# Query agent
(cd repos/dem2 && just curl_api '{"function": "query_agent", "query": "What is my latest cholesterol level?"}')
Check Task Processing Status
# List all tasks to find IDs
(cd repos/dem2 && just curl_api '{"function": "list_tasks"}')
# Get specific task details
(cd repos/dem2 && just curl_api '{"function": "get_task", "task_id": "abc-123-def"}')
Enrich and Validate Biomarkers
# Enrich biomarkers in catalog
(cd repos/dem2 && just curl_api '{"function": "catalog_enrich", "names": ["Total Cholesterol", "LDL", "HDL"]}')
# Search to verify they exist
(cd repos/dem2 && just curl_api '{"function": "catalog_search", "names": ["Total Cholesterol"], "limit": 5}')
How It Differs from Direct curl_api.sh Usage
Using just curl_api (recommended):
(cd repos/dem2 && just curl_api '{"function": "list_documents"}')
Direct script usage (lower-level):
(cd repos/dem2 && bash -c 'source scripts/curl_api.sh && dispatch "{\"function\": \"list_documents\"}"')
Benefits of just curl_api:
- ✅ Cleaner syntax (no need to source or call dispatch)
- ✅ Proper error handling via justfile
- ✅ Consistent working directory handling
- ✅ Part of documented justfile interface
Environment Variables
Default settings (defined in scripts/curl_api.sh):
PATIENT_FIRST_NAME=Stuart
PATIENT_LAST_NAME=McClure
PATIENT_DATE_OF_BIRTH=1969-03-03
AUTH_EMAIL=dbeal@numberone.ai
BACKEND_URL=http://localhost:8000/api/v1
FRONTEND_URL=http://localhost:3000
Override patient context:
PATIENT_FIRST_NAME=John PATIENT_LAST_NAME=Doe PATIENT_DATE_OF_BIRTH=1990-01-01 \
(cd repos/dem2 && just curl_api '{"function": "list_documents"}')
Enable verbose curl output (for debugging):
CURL_VERBOSE=1 (cd repos/dem2 && just curl_api '{"function": "list_documents"}')
Error Handling
If a function doesn't exist:
$ (cd repos/dem2 && just curl_api '{"function": "nonexistent"}')
# ERROR: Unknown function: nonexistent
# Available functions: list_documents, upload_file, process_document, ...
If required arguments are missing:
$ (cd repos/dem2 && just curl_api '{"function": "upload_file"}')
# ERROR: Missing 'path' field in JSON
# Usage: {"function": "upload_file", "path": "path/to/file.pdf"}
When to Use curl_api
Use curl_api for:
- ✅ Quick API testing during development
- ✅ One-off administrative tasks (upload, delete, etc.)
- ✅ Debugging API endpoints and responses
- ✅ Validating authentication and patient context
- ✅ Scripting batch operations
Don't use curl_api for:
- ❌ Production operations (use proper API clients)
- ❌ Performance testing (use dedicated load testing tools)
- ❌ Automated testing (use pytest with proper fixtures)
Related Commands
Low-level curl_api.sh functions (not dispatched, but useful):
# Get patient ID and set context
(cd repos/dem2 && bash -c 'source scripts/curl_api.sh && _export_patient_context_id_internal && declare -p X_PATIENT_CONTEXT_ID')
# Call backend API with auth
(cd repos/dem2 && bash -c 'source scripts/curl_api.sh && _export_patient_context_id_internal && auth_backend "/graph-memory/medical/observations/grouped"')
See also:
repos/dem2/scripts/curl_api.sh- Complete function implementationsrepos/dem2/justfile(line 271-347) - curl_api rule documentation.claude/skills/machina-ui/SKILL.md- UI debugging with curl_api examples
Environment Variables
Authoritative Configuration
The single source of truth for environment variables is machina-meta/.env at the workspace root.
machina-meta/.env # THE authoritative .env file
Important: Any .env files in subdirectories (repos/dem2/.env, repos/medical-catalog/.env, etc.) are temporary symlinks to the root .env file. Do not treat them as separate configuration files.
Environment File Hierarchy
| File | Purpose | Scope |
|---|---|---|
repos/dem2/.env.example | Template for all backend env vars (committed to git) | Reference |
machina-meta/.env | Active local config (never committed) | Local dev |
.env.localhost | Basic local reference (~29 exports) | Local dev |
.env.localhost.alen | Dev reference with Langfuse/MCP (~72 exports) | Local dev |
.env.localhost.alen-dev-1 | Most complete reference (~108 exports) | Local dev |
.env.<namespace>.<deployment> | Generated by import_k8s_environment.py from K8s | Remote debug |
.current_env | Symlink to active .env.<namespace>.<deployment> | Remote debug |
When adding new environment variables:
- Add to
repos/dem2/.env.examplewith documentation comments (blank secrets) - Add to your active
.envfile with real values - Integrate into k8s manifests (see Kubernetes Environment Configuration below)
- If sensitive, add to Google Secret Manager + ExternalSecrets
Database Defaults (Docker Compose)
# PostgreSQL
POSTGRES_USER: postgres
POSTGRES_PASSWORD: demodemo
POSTGRES_DB: demodemo
# Neo4j
NEO4J_AUTH: neo4j/demodemo
# Redis
# No auth by default in dev
Kubernetes Environment Configuration
Architecture: Kustomize Base + Overlays
The k8s environment configuration uses Kustomize with a base + overlay pattern:
repos/dem2-infra/k8s/
├── base/ # Shared across all environments
│ ├── app.yaml # Deployments (tusdi-api, tusdi-webui) + Services
│ ├── config.yaml # ConfigMaps (app-config, postgres-config)
│ ├── external-secrets.yaml # ExternalSecrets (GSM → k8s Secrets)
│ └── kustomization.yaml # Base resource list
└── overlays/
├── dev/
│ ├── env-vars-patch.yaml # ENV_FOR_DYNACONF=development, LANGFUSE_PROMPTS_LABEL=development
│ └── kustomization.yaml # namespace: tusdi-dev
├── staging/
│ ├── env-vars-patch.yaml # ENV_FOR_DYNACONF=staging, LANGFUSE_PROMPTS_LABEL=staging
│ └── kustomization.yaml # namespace: tusdi-staging
└── preview/
├── env-vars-patch.yaml # ENVIRONMENT=preview, auth URLs, LANGFUSE_PROMPTS_LABEL=preview-${PR_NUMBER}
└── kustomization.yaml # namespace: tusdi-preview-$PR_NUMBER
How Environment Variables Are Organized
Three mechanisms for injecting env vars into pods:
| Mechanism | File | Use For | Example |
|---|---|---|---|
| Direct in Deployment | base/app.yaml | Non-sensitive config shared across envs | DYNACONF_REDIS_DB__HOST, TZ |
| ExternalSecrets | base/external-secrets.yaml | Sensitive values from Google Secret Manager | API keys, passwords |
| Overlay Patches | overlays/<env>/env-vars-patch.yaml | Per-environment overrides | ENV_FOR_DYNACONF, LANGFUSE_PROMPTS_LABEL |
Complete tusdi-api Environment Variables
Direct values (base/app.yaml):
| Variable | Value | Category |
|---|---|---|
FORWARDED_ALLOW_IPS | * | Network |
ENVIRONMENT | ${ENVIRONMENT} | App |
DYNACONF_PG_DB__HOST | postgres | PostgreSQL |
DYNACONF_PG_DB__PORT | 5432 | PostgreSQL |
DYNACONF_PG_DB__NAME | tusdi_${ENVIRONMENT} | PostgreSQL |
DYNACONF_PG_DB__USER | tusdi | PostgreSQL |
DYNACONF_NEO4J_DB__HOST | neo4j | Neo4j |
DYNACONF_NEO4J_DB__PORT | 7687 | Neo4j |
DYNACONF_NEO4J_DB__USER | neo4j | Neo4j |
DYNACONF_NEO4J_DB__NAME | neo4j | Neo4j |
NEO4J_URI | bolt://neo4j:7687 | Neo4j |
NEO4J_USER | neo4j | Neo4j |
DYNACONF_REDIS_DB__HOST | redis://redis:6379 | Redis |
DYNACONF_AUTH__FRONTEND_URL | https://${ENVIRONMENT}.${DNS_DOMAIN} | Auth |
DYNACONF_AUTH__REDIRECT_URL | https://${ENVIRONMENT}.${DNS_DOMAIN}/api/v1/auth/google/callback | Auth |
DYNACONF_AUTH__JWT_SECRET_KEY | xBkVoNf... | Auth |
IMPERSONATE_SA | tusdi-nonprod-app@... | GCP |
DYNACONF_TRACING__ENABLED | true | Tracing |
DYNACONF_TRACING__OTEL_EXPORTER_OTLP_ENDPOINT | http://otel-collector.langfuse:4318 | Tracing |
DYNACONF_LANGFUSE__ENABLED | true | Langfuse |
DYNACONF_LANGFUSE__TRACING__ENABLED | true | Langfuse |
DYNACONF_LANGFUSE__BASE__HOST | http://langfuse-web.langfuse:3000 | Langfuse |
LANGFUSE_HOST | http://langfuse-web.langfuse:3000 | Langfuse SDK |
DYNACONF_LANGFUSE__HOST | http://langfuse-web.langfuse:3000 | Langfuse (dynaconf) |
LANGFUSE_PROMPTS_SOURCE | hybrid | Prompt Mgmt |
LANGFUSE_PROMPTS_LABEL | production | Prompt Mgmt |
LANGFUSE_PROMPTS_FALLBACK | true | Prompt Mgmt |
TZ | UTC | Misc |
ENV_FOR_DYNACONF | production | App |
OTEL_PYTHON_URLLIB3_EXCLUDED_URLS | .*sentry\\.io.* | Tracing |
From ExternalSecrets (Google Secret Manager):
| Variable | K8s Secret | GSM Key Pattern | Category |
|---|---|---|---|
DYNACONF_PG_DB__PASSWORD | postgres-secrets | tusdi-${ENVIRONMENT}-postgres-password | PostgreSQL |
DYNACONF_NEO4J_DB__PASSWORD | neo4j-secrets | tusdi-${ENVIRONMENT}-neo4j-password | Neo4j |
NEO4J_PASSWORD | neo4j-secrets | tusdi-${ENVIRONMENT}-neo4j-password | Neo4j |
OPENAI_API_KEY | api-secrets | tusdi-${ENVIRONMENT}-openai-key | LLM |
GEMINI_API_KEY | api-secrets | tusdi-${ENVIRONMENT}-gemini-key | LLM |
SERPER_API_KEY | api-secrets | tusdi-${ENVIRONMENT}-serper-api-key | Search |
VISION_AGENT_API_KEY | api-secrets | tusdi-${ENVIRONMENT}-vision-agent-api-key | LLM |
GOOGLE_SEARCH_API_KEY | api-secrets | tusdi-${ENVIRONMENT}-google-search-api-key | Search |
DYNACONF_GOOGLE_AUTH__CLIENT_ID | backend-google-oauth-secrets | (managed separately) | Auth |
DYNACONF_GOOGLE_AUTH__CLIENT_SECRET | backend-google-oauth-secrets | (managed separately) | Auth |
DYNACONF_CALENDAR__OAUTH_CLIENT_ID | backend-google-oauth-secrets | (managed separately) | Auth |
DYNACONF_CALENDAR__OAUTH_CLIENT_SECRET | backend-google-oauth-secrets | (managed separately) | Auth |
LANGFUSE_PUBLIC_KEY | langfuse-secrets | langfuse-public-key (global) | Langfuse |
LANGFUSE_SECRET_KEY | langfuse-secrets | langfuse-secret-key (global) | Langfuse |
DYNACONF_LANGFUSE__PUBLIC_KEY | langfuse-secrets | langfuse-public-key (global) | Langfuse |
DYNACONF_LANGFUSE__SECRET_KEY | langfuse-secrets | langfuse-secret-key (global) | Langfuse |
Per-environment overrides (overlay patches):
| Variable | Dev | Staging | Preview |
|---|---|---|---|
ENV_FOR_DYNACONF | development | staging | (base: production) |
LANGFUSE_PROMPTS_LABEL | development | staging | preview-${PR_NUMBER} |
ENVIRONMENT | (base) | (base) | preview |
DYNACONF_AUTH__FRONTEND_URL | (base) | (base) | https://preview-${PR_NUMBER}.${DNS_DOMAIN} |
DYNACONF_AUTH__REDIRECT_URL | (base) | (base) | https://oauth-callback.${DNS_DOMAIN}/callback |
OAUTH_PR_NUMBER | - | - | pr-${PR_NUMBER} |
Langfuse Variable Naming
The backend reads Langfuse configuration through two parallel paths:
- Langfuse SDK env vars (
LANGFUSE_*) - Read directly by the Langfuse Python SDK and CLI tools - Dynaconf env vars (
DYNACONF_LANGFUSE__*) - Read by dynaconf intoconfig.tomlstructure, then propagated to SDK byLangfuseIntegration
Both must be set in k8s for full compatibility:
LANGFUSE_HOST+DYNACONF_LANGFUSE__HOST+DYNACONF_LANGFUSE__BASE__HOSTLANGFUSE_PUBLIC_KEY+DYNACONF_LANGFUSE__PUBLIC_KEYLANGFUSE_SECRET_KEY+DYNACONF_LANGFUSE__SECRET_KEY
The DYNACONF_LANGFUSE__BASE__HOST is the legacy path (pre-refactor); DYNACONF_LANGFUSE__HOST is the newer config path. Both are set to the same value for compatibility.
Hybrid Prompt Management per Environment
| Environment | LANGFUSE_PROMPTS_SOURCE | LANGFUSE_PROMPTS_LABEL | Behavior |
|---|---|---|---|
| Base (production) | hybrid | production | Try Langfuse production prompts, fallback to local |
| Dev | hybrid | development | Try Langfuse development prompts, fallback to local |
| Staging | hybrid | staging | Try Langfuse staging prompts, fallback to local |
| Preview | hybrid | preview-${PR_NUMBER} | Try Langfuse PR-specific prompts, fallback to local |
LANGFUSE_PROMPTS_FALLBACK=true ensures all environments gracefully degrade to local config.yml files if Langfuse is unreachable.
How to Add New Environment Variables
Non-Sensitive Variables (configs, feature flags)
- Add to
.env.exampleinrepos/dem2/.env.examplewith comments - Add to base
app.yamlinrepos/dem2-infra/k8s/base/app.yaml:- name: MY_NEW_CONFIG value: "default-value" - If environment-specific, add overrides in
overlays/<env>/env-vars-patch.yaml
Sensitive Variables (API keys, passwords, tokens)
-
Add to
.env.examplewith blank value -
Create secret in Google Secret Manager following naming convention:
tusdi-${ENVIRONMENT}-<descriptive-key-name>e.g.,
tusdi-dev-openai-key,tusdi-staging-openai-keyNote: Some secrets are global (e.g.,langfuse-public-key) — not all follow the per-env pattern -
Add ExternalSecret entry in
repos/dem2-infra/k8s/base/external-secrets.yaml:--- apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: name: my-new-secrets spec: refreshInterval: 1m secretStoreRef: name: gcpsm-secret-store kind: SecretStore target: name: my-new-secrets creationPolicy: Owner data: - secretKey: MY_SECRET_KEY remoteRef: key: tusdi-${ENVIRONMENT}-my-secret-key -
Reference in
app.yaml:- name: MY_SECRET_KEY valueFrom: secretKeyRef: name: my-new-secrets key: MY_SECRET_KEY -
Update supporting files (noted in
external-secrets.yamlheader):terraform/secrets.tf- Add the secret resourcedeploy-k8s.sh- Add secret tovalidate_secrets()function
Checklist: .env.example → k8s Integration
When .env.example changes, follow this workflow:
□ Identify new variables from git diff on .env.example
□ Classify each as sensitive (secret) or non-sensitive (config)
□ For secrets: create GSM entries + ExternalSecret + secretKeyRef in app.yaml
□ For configs: add directly to app.yaml with appropriate default
□ Determine if variable needs per-environment overrides → overlay patches
□ Verify variable naming matches what the application code reads
□ Check for dual naming (e.g., LANGFUSE_* + DYNACONF_LANGFUSE__*)
□ Update terraform/secrets.tf and deploy-k8s.sh if adding new secrets
ExternalSecret Groups
| K8s Secret Name | GSM Key Prefix | Variables |
|---|---|---|
postgres-secrets | tusdi-${ENV}-postgres-* | POSTGRES_PASSWORD |
neo4j-secrets | tusdi-${ENV}-neo4j-* | NEO4J_PASSWORD |
api-secrets | tusdi-${ENV}-*-key | OPENAI_API_KEY, GEMINI_API_KEY, SERPER_API_KEY, VISION_AGENT_API_KEY, GOOGLE_SEARCH_API_KEY |
backend-google-oauth-secrets | (managed separately) | client-id, client-secret |
langfuse-secrets | langfuse-* (global, not per-env) | LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY |
Best Practices
Starting Development
- Always use workspace-level
justcommands from machina-meta root - Use
just dev-statusto verify services before running code - Check for port conflicts before starting:
lsof -i :8000
Stopping Development
- Use
just dev-downto stop all services cleanly - Don't use
docker kill- it doesn't run shutdown hooks - Check nothing is left running:
docker ps
Resetting Data
- Stop with
just dev-down - Remove volumes:
docker volume prune -f - Start fresh:
just dev-up
Updating Images
# Pull latest images
docker compose pull
# Rebuild local images
docker compose build --no-cache
# Update and restart
just dev-restart
Integration with Other Skills
machina-git
Use machina-git for all git operations. Docker skill handles containers only.
kubernetes
Kubernetes skill handles production cluster operations. Docker images built here are deployed via Kubernetes.
machina-ui
For frontend debugging, use machina-ui skill which can interact with containerized services.
Pull Request Creation
Overview
This section provides the authoritative process for creating pull requests in the dem2, dem2-webui, and dem2-infra repositories.
Note: This section is in machina-docker (not machina-git) because PR creation uses the
ghCLI for GitHub operations, while machina-git handles core git operations (commit, push, status). For git commands below, use the machina-git skill workflow.
PR Creation Process
Step 1: Identify merge base and examine ALL commits
# Find merge base (usually origin/dev for feature branches)
cd repos/dem2 && git fetch origin && git merge-base origin/dev HEAD
# Get comprehensive branch stats
BASE=$(git merge-base origin/dev HEAD)
echo "=== Branch Stats ==="
echo "Commits: $(git rev-list --count $BASE..HEAD)"
echo "Files: $(git diff --stat $BASE..HEAD | tail -1)"
echo "Date range: $(git log --format='%as' $BASE..HEAD | tail -1) to $(git log --format='%as' -1 HEAD)"
# List ALL commits from merge base to HEAD
git log --oneline $BASE..HEAD
# Get the FULL diff (CRITICAL - examine ALL changes, not just commit messages)
git diff $BASE..HEAD
Step 2: Analyze changes by category
Group all changes into categories:
- Bug Fixes: Error corrections, crash fixes
- Features: New functionality
- Refactoring: Code improvements without behavior change
- Performance: Speed/resource optimizations
- Documentation: Docs, comments, README updates
- Testing: Test additions/modifications
- Infrastructure: CI/CD, build, deployment changes
Step 3: Generate PR using gh CLI
cd repos/dem2 && gh pr create --base dev --head <branch-name> --title "<title>" --body "<body>"
PR Summary Template
Use this exact structure for PR summaries:
## Summary
<One paragraph overview describing the primary purpose and key achievements of this PR.>
### Key Changes
| Category | Description |
|----------|-------------|
| **<Category 1>** | <Brief description of what changed> |
| **<Category 2>** | <Brief description of what changed> |
---
### Key Changes
#### <Feature/Fix Area 1>
- ✅ <Completed item with specific detail>
- ✅ <Completed item with specific detail>
#### <Feature/Fix Area 2>
- ✅ <Completed item with specific detail>
### Technical Details
<If applicable, include tables for:>
- Performance improvements (Before/After/Impact)
- Architectural decisions (Decision/Rationale/Outcome)
- Breaking changes (if any)
## Test Plan
- [x] <Specific test performed>
- [x] <Specific test performed>
- [x] <Verification method used>
## Related Issues
- Fixes #<issue> (if applicable)
- Related to #<issue> (if applicable)
PR Summary Generation Prompt
When generating a PR summary, follow these rules:
- Read ALL diffs: Examine every file changed from merge base to HEAD
- Categorize changes: Group by feature area, not by file
- Be specific: Include actual function names, file paths, metrics
- Use tables: For comparisons, metrics, architectural decisions
- Include evidence: Reference specific commits, test results, measurements
- Quantify impact: "94% cost reduction" not "significant savings"
- Test plan must be real: Only include tests actually performed
- No fluff: Every line should convey information
DO NOT:
- Include generic statements like "improved code quality"
- List files without explaining what changed
- Include untested items in the test plan
- Add performance claims without measurements
- Use vague language like "various improvements"
DO:
- Start with the most important/impactful changes
- Group related changes together
- Include specific function/class names
- Reference commit hashes for key changes
- Explain WHY architectural decisions were made
Example Workflow
# 1. Get merge base and stats
cd repos/dem2 && git fetch origin
BASE=$(git merge-base origin/dev HEAD)
echo "Commits: $(git rev-list --count $BASE..HEAD)"
echo "Files: $(git diff --stat $BASE..HEAD | tail -1)"
echo "Date range: $(git log --format='%as' $BASE..HEAD | tail -1) to $(git log --format='%as' -1 HEAD)"
# 2. List all commits (READ each one for context)
git log --oneline $BASE..HEAD
# 3. Get full diff (CRITICAL - analyze the actual code changes)
git diff $BASE..HEAD
# 4. Create PR with summary
gh pr create --base dev --title "feat: <concise title>" --body "$(cat <<'EOF'
## Summary
<summary based on diff analysis>
...
EOF
)"
Creating PRs for Different Repos
dem2 (Backend):
cd repos/dem2 && gh pr create --base dev --head <branch>
dem2-webui (Frontend):
cd repos/dem2-webui && gh pr create --base dev --head <branch>
dem2-infra (Infrastructure):
cd repos/dem2-infra && gh pr create --base main --head <branch>
Reference Files
This skill includes detailed documentation in references/:
- compose-files.md - Complete documentation of all 13 docker-compose files
- profiles.md - Profile system (dev, test, prod) detailed guide
- ports.md - All port mappings with conflict detection
- volumes.md - Volume management, backup, restore
- troubleshooting.md - Common issues and solutions
- scripts.md - Helper script documentation (dev_stack.py, etc.)
Use Read to access specific reference files when detailed information is needed.
Additional references:
- user-workflow-preferences.md —
.current_envfor API testing, kubectl locally, preview environment mappings