name: machina-docker description: Docker development environment for machina-meta workspace. Use for container management, development stacks, database services, health checks, volume management, and infrastructure. The single authoritative source for all Docker operations.

Docker Skill

Comprehensive Docker environment management for the machina-meta workspace. This is the single authoritative source for all Docker operations.

When to Use This Skill

Development Environment Management

Starting/stopping the development stack
Checking service status and health
Managing database containers (PostgreSQL, Neo4j, Redis, Qdrant)
Running full-stack development (frontend + backend + databases)

Service Operations

Building Docker images
Running specific service containers
Debugging container issues
Log extraction and analysis

Database Operations

Managing development databases
Snapshot/restore operations (Qdrant)
Volume management and data persistence
Database migrations in containers

Specialized Stacks

Langfuse observability (local and production)
Local LLM inference (Ollama, vLLM with GPU)
Remote debugging tools (pgAdmin, Neo4j Browser)
gcloud-admin DevOps container

Infrastructure & Deployment

Gateway infrastructure (ngrok, nginx)
Monitoring stacks (Langfuse + OTEL)
LLM experiment environments

Kubernetes Environment Configuration

Adding new environment variables to k8s deployments
Integrating .env.example changes into Kustomize manifests
Managing secrets via ExternalSecrets + Google Secret Manager
Per-environment configuration via overlay patches

Quick Reference

Primary Development Commands

All Docker commands should be run from the machina-meta workspace root.

Note: Justfile rules in repos/dem2, repos/dem2-webui, and repos/medical-catalog are deprecated. Use workspace-level commands instead.

Workspace Level (machina-meta root):

Command	Purpose	Underlying Operation
`just dev-up`	Start full stack (all services)	`./scripts/dev_stack.py up`
`just dev-down`	Stop all services	`./scripts/dev_stack.py down`
`just dev-status`	Check all service health	`./scripts/dev_stack.py status`
`just dev-restart`	Rebuild and restart	`docker compose --profile dev up -d --build`
`just dev-check`	Run sanity check tests	Non-destructive verification suite

gcloud-admin DevOps:

Command	Purpose	Underlying Operation
`just gcloud-admin::shell`	Interactive DevOps shell	`docker compose run --rm gcloud-admin`
`just gcloud-admin::kubectl <args>`	Run kubectl	`docker compose run --rm gcloud-admin kubectl <args>`
`just gcloud-admin::helm <args>`	Run helm	`docker compose run --rm gcloud-admin helm <args>`
`just gcloud-admin::k9s`	Cluster TUI	`docker compose run --rm gcloud-admin k9s`
`just gcloud-admin::argocd <args>`	Run ArgoCD CLI	`docker compose run --rm gcloud-admin argocd <args>`
`just gcloud-admin::preview-info <identifier>`	Get preview deployment info	Shows tags, commits, PR status, ArgoCD health

Preview Environment Info:

Check the current state of a preview deployment using any identifier:

# By GKE namespace
just gcloud-admin::preview-info --gke-namespace tusdi-preview-92

# By git tag
just gcloud-admin::preview-info --git-tag preview-dbeal-docproc-dev

# By ArgoCD app name
just gcloud-admin::preview-info --argocd-app preview-pr-92

# By infra branch
just gcloud-admin::preview-info --infra-branch preview/dbeal-docproc-dev

# By infra PR number
just gcloud-admin::preview-info --pr 92

# By git branch
just gcloud-admin::preview-info --git-branch feature/dbeal-docproc-dev

# Output formats: terminal, json, markdown (default: markdown)
just gcloud-admin::preview-info --gke-namespace tusdi-preview-92 --format json

Output includes:

Preview ID resolved from identifier
Backend/Frontend tag commits and dates
Infrastructure branch and PR status
ArgoCD sync and health status
GitHub workflow status for all repos

Updating Preview Deployments (Deploy a Branch Commit to Preview):

This is the complete, authoritative process for deploying code changes to a GKE preview environment. Use this whenever the user says "deploy to preview-XX".

How the Preview CI/CD Pipeline Works

The deployment pipeline is fully automated once a preview tag is pushed:

1. Developer pushes tag `preview-<branch>` to dem2 (or dem2-webui)
       ↓
2. GitHub Actions `pr-preview-build.yml` triggers on `push: tags: preview-*`
       ↓
3. CI builds Docker image, pushes to Artifact Registry with tag:
   `<tag-name>-<full-commit-sha>`
   Example: `preview-alen-dev-1-ae99b3911e7a564c1470a116e3097792342f665b`
       ↓
4. CI dispatches `preview-update` event to dem2-infra repository
       ↓
5. dem2-infra workflow updates `kustomization.yaml` with new image tag
       ↓
6. ArgoCD detects the infra change and syncs the deployment
       ↓
7. New pod starts with the updated image in tusdi-preview-XX namespace

Image tag format: <git-tag-name>-<full-40-char-commit-sha>

For tag pushes: preview-alen-dev-1-ae99b3911e7a564c1470a116e3097792342f665b
For PR builds: pr-92-ae99b3911e7a564c1470a116e3097792342f665b

Step-by-Step: Deploy Backend (tusdi-api) to Preview

# Step 1: Ensure your code is committed and pushed to the branch
# (Use machina-git skill for this)
cd repos/dem2 && git log -1 --oneline
# Verify HEAD is the commit you want to deploy

# Step 2: Tag current HEAD with preview-<branch-name> and force-push
# This triggers the CI pipeline automatically
cd repos/dem2 && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force

# Example for alen-dev-1 branch:
cd repos/dem2 && git tag -f preview-alen-dev-1 && git push origin preview-alen-dev-1 --force

Shortcut using justfile:

# Equivalent to the above (tags + force-pushes in one command)
(cd repos/dem2 && just preview <branch-name>)

# Example:
(cd repos/dem2 && just preview alen-dev-1)

Step-by-Step: Deploy Frontend (tusdi-webui) to Preview

# Same pattern as backend
cd repos/dem2-webui && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force

# Or using justfile:
(cd repos/dem2-webui && just preview <branch-name>)

Step-by-Step: Deploy Both Backend and Frontend

# Deploy both at once
(cd repos/dem2 && just preview-both <branch-name>)

# Or manually:
cd repos/dem2 && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force
cd repos/dem2-webui && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force

Step 3: Monitor Deployment Progress

# Check CI workflow status (wait for image build to complete)
cd repos/dem2 && gh run list --limit 3

# Watch a specific run
cd repos/dem2 && gh run watch <run-id>

# Check ArgoCD sync status
just gcloud-admin::preview-info --gke-namespace tusdi-preview-<XX>

# Watch pods for rollout
kubectl get pods -n tusdi-preview-<XX> -w

# Check pod logs after deployment
kubectl logs -n tusdi-preview-<XX> -l app.kubernetes.io/component=api --tail=50

Common Deployment Scenarios

Scenario	Command
Deploy backend only	`cd repos/dem2 && git tag -f preview-<branch> && git push origin preview-<branch> --force`
Deploy frontend only	`cd repos/dem2-webui && git tag -f preview-<branch> && git push origin preview-<branch> --force`
Deploy both	`(cd repos/dem2 && just preview-both <branch>)`
Check build status	`cd repos/dem2 && gh run list --limit 3`
Check deployment health	`kubectl get pods -n tusdi-preview-<XX>`
Force ArgoCD sync	`just gcloud-admin::argocd app sync argocd/preview-pr-<XX>`

Troubleshooting Deployments

Issue	Solution
CI not triggered	Verify tag was pushed: `cd repos/dem2 && git ls-remote --tags origin \| grep preview-<branch>`
Image not found	Check CI run logs: `cd repos/dem2 && gh run list --limit 3`
ArgoCD not syncing	Manual sync: `just gcloud-admin::argocd app sync argocd/preview-pr-<XX>`
Pod CrashLoopBackOff	Check logs: `kubectl logs -n tusdi-preview-<XX> -l app.kubernetes.io/component=api --tail=100`
Old image still running	Delete deployment to force recreation: `kubectl delete deployment tusdi-api -n tusdi-preview-<XX>` then ArgoCD sync

Important Notes

Only backend changes need backend deployment. If you only changed dem2, you only need to tag/push dem2.
The tag must match the branch name pattern used when the preview was originally created.
Force-push (--force) is expected for preview tags — they are mutable pointers.
CI build takes ~3-5 minutes. ArgoCD sync adds ~1-2 minutes after that.
The kustomization.yaml in dem2-infra is updated automatically by the CI dispatch — you do NOT need to manually edit it.

Syncing Local dem2-infra After CI Dispatch

When the CI pipeline updates kustomization.yaml in dem2-infra, it force-pushes to the preview/<branch> branch on the remote. This means your local preview/<branch> branch diverges from the remote.

You MUST sync your local branch before making any further infra changes:

# After CI build completes and dispatches to dem2-infra:
cd repos/dem2-infra && git pull --rebase origin preview/<branch-name>

# Example for alen-dev-1:
cd repos/dem2-infra && git pull --rebase origin preview/alen-dev-1

Why --rebase is required:

The CI force-pushes a new commit to the remote branch with the updated image tag
A regular git pull (merge) would create unnecessary merge commits
git pull --rebase replays any local commits on top of the CI's update
If your only local commit was a previous image tag update, it will be skipped automatically (no conflict)

Common scenarios:

Scenario	What Happens
No local changes to infra	`git pull --rebase` fast-forwards cleanly
Local infra changes (env vars, patches)	Local commits are rebased on top of CI's image tag update
Local image tag edit conflicts with CI	Rebase conflict — resolve by accepting the CI version (theirs), since the CI tag is correct

Do NOT manually edit kustomization.yaml image tags — always let the CI handle it via the tag-push workflow. If you need to verify the current image tag after pull:

cd repos/dem2-infra && grep newTag k8s/overlays/preview/kustomization.yaml

Deprecated Commands (in child repos)

The following patterns are deprecated and should be migrated to workspace-level commands:

Deprecated Command	Replacement
`(cd repos/dem2 && just dev-env-up)`	`just dev-up` (starts full stack)
`(cd repos/dem2 && just dev-env-down)`	`just dev-down`
`(cd repos/dem2 && just med-api-up)`	`just dev-up`
`(cd repos/medical-catalog && just dev-env-up)`	`just dev-up`
`(cd repos/medical-catalog && just docker-build)`	(pending migration)

Docker Compose Files Map

File	Location	Purpose	Profile
`docker-compose.yaml`	Root	Main workspace stack (primary)	`dev`
`docker-compose.yaml`	`repos/dem2/infrastructure/`	Backend dev environment	`dev`, `test`
`docker-compose.langfuse.local.yaml`	`repos/dem2/infrastructure/`	Local Langfuse observability	-
`docker-compose.yml`	`repos/dem2/infrastructure/remote-debug/`	pgAdmin, Neo4j debug	-
`docker-compose.qdrant.yaml`	`repos/dem2/services/indicators-catalog/`	Standalone Qdrant	-
`docker-compose.yaml`	`repos/medical-catalog/infra/`	Catalog Qdrant (port 16333)	`dev`
`docker-compose.yaml`	`gcloud-admin/`	DevOps admin container	-
`docker-compose.ngrok-nginx.yaml`	`repos/dem2-infra/.../gateway/`	Public gateway tunnel	-
`docker-compose.langfuse.yaml`	`repos/dem2-infra/.../monitoring/`	Production Langfuse + OTEL	`monitoring`
`docker-compose.ollama.yml`	`repos/dem2-infra/.../experiment-setup/`	Local LLM with GPU	-
`docker-compose.vllm.yml`	`repos/dem2-infra/.../experiment-setup/`	vLLM inference server	-

See references/compose-files.md for detailed documentation of each file.

Profile System

All services use the unified dev profile:

Profile	Services	Use Case
`dev`	All services (databases + backend + frontend + catalog)	Full stack development

Starting Services:

# Full stack (all services)
docker compose --profile dev up -d

# Or use the just command (recommended)
just dev-up

Note: The --profile dev flag is required for up, down, restart, and build operations. It is NOT required for querying running containers (ps, logs, stats, exec).

See references/profiles.md for detailed profile documentation.

Port Mappings

Development Ports (localhost)

Port	Service	Stack	Health Check
3000	Frontend (Next.js)	machina-meta	http://localhost:3000
5432	PostgreSQL	machina-meta	`pg_isready -h localhost`
5540	RedisInsight	machina-meta	http://localhost:5540
6333	Qdrant REST API	machina-meta	http://localhost:6333/healthz
6334	Qdrant Web UI	machina-meta	http://localhost:6334/dashboard
6379	Redis	machina-meta	`redis-cli ping`
7474	Neo4j Browser	machina-meta	http://localhost:7474
7687	Neo4j Bolt	machina-meta	(used by applications)
8000	Backend API	machina-meta	http://localhost:8000/docs
8001	Medical Catalog	machina-meta	http://localhost:8001/health

Langfuse Stack Ports

Port	Service	Purpose
3003	Langfuse Web	Observability UI
3030	Langfuse Worker	Background processing
5433	Langfuse PostgreSQL	Langfuse database
6380	Langfuse Redis	Langfuse cache
8123	ClickHouse HTTP	Analytics queries
9090	MinIO API	S3-compatible storage
9091	MinIO Console	MinIO admin UI

Specialized Ports

Port	Service	Purpose
5050	pgAdmin	PostgreSQL admin UI (remote-debug)
11434	Ollama	Local LLM inference
16333	Qdrant (catalog)	Medical catalog vector store
17474	Neo4j (remote)	Remote debugging Neo4j

See references/ports.md for complete port documentation.

Common Usage Patterns

Pattern 1: Full Stack Development (Recommended)

Start everything from workspace root:

# Start full stack
just dev-up

# Check status
just dev-status

# When done
just dev-down

Pattern 2: Backend Development Only

Start databases, run backend locally:

# Start full stack (includes databases)
just dev-up

# Stop the containerized backend to run locally
docker stop machina-meta-backend-1

# Run backend locally
(cd repos/dem2 && just run)

# When done
just dev-down

Pattern 3: Frontend Development Only

# Start full stack
just dev-up

# Or just start frontend locally if backend is running
(cd repos/dem2-webui && pnpm dev)

Pattern 4: Clean Restart with Fresh Data

Reset databases and reload fixtures:

# Stop and remove volumes
just dev-down
docker volume prune -f  # Warning: removes all unused volumes

# Start fresh
just dev-up

Pattern 5: Local LLM Inference (Ollama)

# Start Ollama with GPU
(cd repos/dem2-infra/infrastructure/docker/google_cloud/experiment-setup && \
  docker compose -f docker-compose.ollama.yml up -d)

# Pull a model
docker exec -it ollama ollama pull llama3.2

# Test
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "prompt": "Hello"}'

Pattern 6: Local Langfuse Observability

# Start Langfuse stack
(cd repos/dem2/infrastructure && docker compose -f docker-compose.langfuse.local.yaml up -d)

# Access UI at http://localhost:3003

# Stop
(cd repos/dem2/infrastructure && docker compose -f docker-compose.langfuse.local.yaml down)

Pattern 7: Remote Debugging Tools

# Start pgAdmin and Neo4j browser
(cd repos/dem2/infrastructure/remote-debug && docker compose up -d)

# pgAdmin: http://localhost:5050
# Neo4j: http://localhost:17474

Pattern 8: gcloud-admin DevOps

# First-time setup
just gcloud-admin::setup-nonprod

# Interactive shell
just gcloud-admin::shell

# Run kubectl commands
just gcloud-admin::kubectl get pods -n argocd

# Interactive cluster UI
just gcloud-admin::k9s

Pattern 9: Remote GKE Environment Access

This is the unified workflow for accessing remote GKE environments (preview, dev, staging).

CRITICAL: Local Docker vs Remote GKE — Mutually Exclusive

Port-forwarded remote services bind to the SAME ports as local Docker containers (5432, 7474, 8000, etc.). You MUST choose one:

Mode	Description	Port Forwarding Required
Local Docker	`just dev-up` running	NO — services already on localhost
Remote GKE	Point to a preview/dev/staging namespace	YES — must `just dev-down` first

Step 1: Determine current mode

# Check if local Docker containers are running
just dev-status

If local containers are running and you want local dev: no port forwarding needed, skip to Step 5 for API usage.
If you need remote GKE access: proceed to Step 2.

Step 2: Stop local Docker (required for remote access)

# Stop all local containers to free ports
just dev-down

Step 3: Import environment from Kubernetes

Extract all environment variables (configs, ConfigMaps, Secrets) from a remote deployment and generate a local .env file with K8s hostnames rewritten to localhost for port-forwarded access.

# Import environment from a k8s deployment
./scripts/import_k8s_environment.py import -n <NAMESPACE> -d <DEPLOYMENT>

# Examples:
./scripts/import_k8s_environment.py import -n tusdi-preview-99 -d tusdi-api
./scripts/import_k8s_environment.py import -n tusdi-staging -d tusdi-api
./scripts/import_k8s_environment.py import -n tusdi-dev -d tusdi-api

Output: Creates .env.<NAMESPACE>.<DEPLOYMENT> (e.g., .env.tusdi-preview-99.tusdi-api) with:

All direct values, ConfigMap values, and Secret values resolved
K8s service hostnames (postgres, neo4j, redis, etc.) rewritten to localhost
URL variables (redis://redis:6379, bolt://neo4j:7687) rewritten to localhost equivalents
File permissions set to 600 (owner read/write only)

Switch active environment:

.current_env is a symlink that points to the active environment file. It acts as a context pointer — switching it changes which GKE environment all tools (curl_api, neo4j-query.py, etc.) operate against.

# Point .current_env to the imported environment (overwrites any previous symlink)
ln -sfn .env.tusdi-preview-99.tusdi-api .current_env

If the .env.<NAMESPACE>.<DEPLOYMENT> file doesn't exist yet (or is out of date), import it first using Step 3 above, then create the symlink.

Switching between environments:

# Switch to preview-99
ln -sfn .env.tusdi-preview-99.tusdi-api .current_env

# Switch to staging
ln -sfn .env.tusdi-staging.tusdi-api .current_env

# Verify which environment is active
ls -la .current_env

Compare environments (detect drift after re-import):

./scripts/import_k8s_environment.py compare .env.old .env.new
./scripts/import_k8s_environment.py compare .env.old .env.new --markdown
./scripts/import_k8s_environment.py compare .env.old .env.new --json

Additional import options:

Flag	Purpose
`-c <name>`	Select specific container (default: first)
`-o <path>`	Custom output path
`--no-comments`	Omit source comments
`--no-localhost`	Disable hostname rewriting
`--json`	Output as JSON instead of .env file
`-q`	Quiet mode

Step 4: Port-forward all services

ALWAYS use port_forward_service.py — NEVER run individual kubectl port-forward commands.

Launch the port-forward script in a dedicated tmux pane so its output stays visible without cluttering the main terminal:

# Forward ALL services in a namespace (recommended)
# Creates a small (~10 line) pane at the top, focus stays on current pane
# Capture the pane ID for later use (e.g., to kill it)
PF_PANE=$(tmux split-window -d -b -v -l 10 -P -F '#{pane_id}' \
  "./scripts/port_forward_service.py '{\"port_forward\": [{\"namespace\": \"<NAMESPACE>\", \"service_name\": \".*\"}]}'")

# Example: forward all tusdi-preview-99 services
PF_PANE=$(tmux split-window -d -b -v -l 10 -P -F '#{pane_id}' \
  "./scripts/port_forward_service.py '{\"port_forward\": [{\"namespace\": \"tusdi-preview-99\", \"service_name\": \".*\"}]}'")

# Forward specific services only
PF_PANE=$(tmux split-window -d -b -v -l 10 -P -F '#{pane_id}' \
  "./scripts/port_forward_service.py '{\"port_forward\": [{\"namespace\": \"tusdi-preview-99\", \"service_name\": \"neo4j\"}, {\"namespace\": \"tusdi-preview-99\", \"service_name\": \"postgres\"}]}'")

tmux flags: -d keeps focus on the current pane, -b places the new pane above (top), -v splits vertically, -l 10 sets height to 10 lines, -P -F '#{pane_id}' prints the new pane ID (captured into $PF_PANE).

The <NAMESPACE> should match the namespace shown in the header of .current_env.

Script details:

service_name supports regex patterns (".*" matches all services in the namespace)
Multi-port services (e.g., Neo4j: 7474 + 7687) forward all ports automatically
Automatic restart when port-forwards drop
Uses K8s targetPort as the local port (standard ports: 5432, 7474, 8000, etc.)
View JSON schema: ./scripts/port_forward_service.py --schema

State file: While running, the script writes /var/run/user/<uid>/port_forward_service.json containing the PID and all forwarded services. The file is automatically removed on exit (including tmux kill-pane). Read it to check which services are currently forwarded:

cat /var/run/user/$(id -u)/port_forward_service.json | jq

Stopping port-forwards:

# Kill the port-forward pane using the captured pane ID
tmux kill-pane -t "$PF_PANE"

# Or kill the process directly (from any pane)
pkill -f port_forward_service

Step 5: Use the environment

# Source .current_env and call backend API
(. .current_env && cd repos/dem2 && just curl_api '{"function": "list_documents"}') | jq

# Query observations
(. .current_env && cd repos/dem2 && just curl_api '{"function": "get_observations_grouped", "per_type_values_limit": 3}') | jq

# Filter specific biomarker
(. .current_env && cd repos/dem2 && just curl_api '{"function": "get_observations_grouped", "per_type_values_limit": 3}') | jq '.items[]|select(.observation_type.display_name=="Folate").values'

# Query remote Neo4j directly (credentials come from .current_env)
(. .current_env && ./scripts/neo4j-query.py --format json "MATCH (n:PatientNode) RETURN n.first_name, n.last_name LIMIT 5")

Authentication scripts (called automatically by curl_api):

user_manager.py — Generates JWT tokens for API authentication
- uv run scripts/user_manager.py user token --export — outputs export AUTH_HEADER="Bearer ..."
- uv run scripts/user_manager.py user token --export-cookie — for UI/browser authentication
curl_api.sh — JSON dispatch system; handles JWT tokens, patient context headers, respects BACKEND_URL

Common namespaces:

Namespace	Environment
`tusdi-dev`	Development
`tusdi-staging`	Staging
`tusdi-preview-<id>`	Preview (e.g., `tusdi-preview-99`)

Note: Always use kubectl directly on the host machine for port-forwarding. The gcloud-admin container has networking limitations that prevent port-forwarded ports from being accessible on the host.

Pattern 10: ArgoCD Application Sync and Management

For syncing, monitoring, and troubleshooting ArgoCD deployments on GKE preview environments.

Prerequisites:

gcloud-admin container setup completed (just gcloud-admin::setup-nonprod)
ArgoCD authentication (just gcloud-admin::auth-argocd-login)

Key Concepts:

ArgoCD app name: You MUST list apps first to find the correct name — do NOT guess app names
Kubernetes namespace: Format is tusdi-<env> (e.g., tusdi-preview-92, tusdi-staging)
App names and namespace names are DIFFERENT and not always predictable

Check ArgoCD Authentication Status:

# Verify you're logged in
just gcloud-admin::auth-argocd-status

# If not logged in, authenticate (interactive SSO - requires browser)
# Linux:
just gcloud-admin::auth-argocd-login
# macOS (--network host doesn't work, use import instead):
# 1. argocd login <server> --sso --grpc-web  (on host)
# 2. just gcloud-admin::auth-argocd-import

List ArgoCD Applications (ALWAYS do this first):

# IMPORTANT: Always list apps first to find the correct app name
# Do NOT guess app names - they may not follow predictable patterns
just gcloud-admin::argocd app list

# Filter preview apps
just gcloud-admin::argocd app list | grep preview

Check Application Status:

# Get app summary (use the EXACT app name from 'app list' output above)
just gcloud-admin::argocd app get <app-name-from-list>

# Get status as JSON for scripting
just gcloud-admin::argocd app get <app-name-from-list> --output json | jq '{
  sync_status: .status.sync.status,
  health_status: .status.health.status,
  operation_state: .status.operationState.phase
}'

Sync Application (Trigger Deployment):

# Basic sync - applies manifests from git
just gcloud-admin::argocd app sync <app-name-from-list>

# Force sync - override any drift
just gcloud-admin::argocd app sync <app-name-from-list> --force

# Sync with prune - remove resources not in git
just gcloud-admin::argocd app sync <app-name-from-list> --prune

Handle Stuck Operations:

If sync fails with "another operation is already in progress":

# Check current operation state
just gcloud-admin::argocd app get <app-name-from-list> --output json | jq '.status.operationState'

# Terminate stuck operation
just gcloud-admin::argocd app terminate-op <app-name-from-list>

# Then retry sync
just gcloud-admin::argocd app sync <app-name-from-list>

Monitor Deployment Progress:

# Watch pods in the namespace
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -w

# Check specific deployment
just gcloud-admin::kubectl get pods -n tusdi-preview-92 | grep tusdi-api

# Get pod logs
just gcloud-admin::kubectl logs -n tusdi-preview-92 <pod-name> --tail=100

# Get recent events
just gcloud-admin::kubectl get events -n tusdi-preview-92 --sort-by='.lastTimestamp' | tail -20

Troubleshooting Unhealthy Deployments:

# Check deployment status
just gcloud-admin::kubectl describe deployment tusdi-api -n tusdi-preview-92

# Check pod status and restart count
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -o wide

# Check pod events for errors
just gcloud-admin::kubectl describe pod <pod-name> -n tusdi-preview-92

# Check pod logs for startup errors (like ImportError)
just gcloud-admin::kubectl logs -n tusdi-preview-92 <pod-name> --tail=200

Force Redeploy (Delete and Let ArgoCD Recreate):

If a deployment is corrupted or stuck:

# Delete the deployment (ArgoCD will recreate it)
just gcloud-admin::kubectl delete deployment tusdi-api -n tusdi-preview-92

# Sync to trigger recreation
just gcloud-admin::argocd app sync <app-name-from-list>

# Monitor recreation
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -w

Common Scenarios:

Problem	Solution
ImportError on startup	Push code fix, ArgoCD auto-syncs, or manual sync
Deployment deleted accidentally	`argocd app sync` recreates from manifests
Sync stuck "in progress"	`argocd app terminate-op` then retry
Pod CrashLoopBackOff	Check logs, fix code/config, push, sync
Image pull error	Verify image exists in registry, check secrets

Workflow: Fix Code Error on Preview:

Identify error (e.g., ImportError in pod logs)
Fix locally and commit the fix
Push to branch that the preview tracks
Wait for CI/CD to build new image (GitHub Actions)
ArgoCD auto-syncs or manually trigger: just gcloud-admin::argocd app sync <app-name-from-list>
Monitor pod until healthy: just gcloud-admin::kubectl get pods -n tusdi-preview-92 -w
Verify fix by checking pod logs: just gcloud-admin::kubectl logs -n tusdi-preview-92 <pod-name> --tail=50

Volume Management

Named Volumes

Volume	Service	Purpose
`postgres_data`	PostgreSQL	User data, sessions
`neo4j_data`	Neo4j	Graph data
`neo4j_logs`	Neo4j	Database logs
`redis_data`	Redis	Cache persistence
`qdrant_storage`	Qdrant	Vector embeddings
`redisinsight_data`	RedisInsight	UI settings

Qdrant Snapshot Restore

The dev_stack.py script automatically restores Qdrant snapshots on first startup:

# Manual restore if needed
(cd repos/medical-catalog && just snapshot-restore-all)

# Check Qdrant collections
curl http://localhost:6333/collections

Clearing Volumes

# Stop services and remove volumes
docker compose down -v

# Remove specific volume
docker volume rm machina-meta_postgres_data

# Remove all unused volumes (DANGEROUS)
docker volume prune -f

See references/volumes.md for detailed volume documentation.

Health Checks

dev_stack.py Orchestration

The scripts/dev_stack.py script provides intelligent health monitoring:

90-second timeout for service startup
HTTP endpoint validation for web services
Container health status from Docker
Automatic error analysis from logs
Qdrant snapshot detection and restore

Service Health Endpoints

Service	Health Endpoint
Backend	`GET /api/v1/health`
Medical Catalog	`GET /health`
Qdrant	`GET /healthz`
Neo4j	`GET :7474` (browser)
PostgreSQL	`pg_isready`
Redis	`redis-cli ping`

Sanity Check Suite

Run the comprehensive sanity check before committing Docker changes:

just dev-check

This runs 6 non-destructive verification tests:

Service Status - ./scripts/dev_stack.py status
Container Status - docker compose ps
Health Checks - Container health inspection
Resource Usage - docker stats snapshot
Volume Status - List machina volumes
Endpoint Health - HTTP checks for backend, catalog, and Qdrant

Manual Health Checks

# Check all container status
docker compose ps

# Check specific service logs
docker compose logs backend --tail 50

# Check container health
docker inspect --format='{{json .State.Health}}' machina-meta-backend-1

Troubleshooting

Port Already in Use

# Find what's using the port
lsof -i :8000
# or
ss -tulpn | grep 8000

# Kill process or stop container
docker ps | grep 8000
docker stop <container-name>

Container Won't Start

# Check logs
docker compose logs <service> --tail 100

# Check health status
docker inspect --format='{{json .State.Health}}' <container-name>

# Check events
docker events --since 5m

Database Connection Failed

# Verify containers are healthy
just dev-status

# Check network connectivity
docker network ls
docker network inspect machina-meta_default

# Test connection from host
pg_isready -h localhost -p 5432
redis-cli -h localhost -p 6379 ping

Neo4j Won't Start

# Check Neo4j logs
docker compose logs neo4j --tail 100

# Common issue: memory limits
# Edit docker-compose.yaml NEO4J_dbms_memory_* settings

Qdrant Snapshots Not Restored

# Check if volume exists
docker volume ls | grep qdrant

# Manual restore
(cd repos/medical-catalog && just snapshot-restore-all)

# Verify collections
curl http://localhost:6333/collections

Log Extraction

# Extract backend logs
(cd repos/dem2 && ./scripts/extract_backend_logs.sh -s backend -l ERROR --since 10m)

# Follow all logs
docker compose logs -f

# Follow specific service
docker compose logs -f backend

See references/troubleshooting.md for comprehensive troubleshooting guide.

API Testing with curl_api

The curl_api justfile rule in repos/dem2 provides a convenient JSON-based interface for testing backend APIs without writing code.

Overview

Location: repos/dem2/justfile (rule: curl_api) Backend Script: repos/dem2/scripts/curl_api.sh Purpose: Call backend API functions using JSON dispatch for development and testing

How It Works

The curl_api rule uses a JSON dispatch system that:

Accepts a JSON payload with a function field and arguments
Routes the call to a registered bash function in curl_api.sh
Handles authentication automatically (JWT tokens + patient context)
Executes the API call and returns the result

Basic Usage

# From repos/dem2 directory
(cd repos/dem2 && just curl_api '{"function": "function_name", "arg1": "value1", ...}')

Authentication:

Automatically handles JWT token generation via user_manager.py
Sets patient context header (X-Patient-Context-ID)
Default user: dbeal@numberone.ai
Default patient: Stuart McClure, DOB: 1969-03-03

Available Function Categories

Document Management

List documents:

(cd repos/dem2 && just curl_api '{"function": "list_documents"}')

Upload a file:

(cd repos/dem2 && just curl_api '{"function": "upload_file", "path": "datasets/documents/test.pdf"}')

Process a specific document:

(cd repos/dem2 && just curl_api '{"function": "process_document", "file_id": "uuid-here"}')

Process all uploaded documents:

(cd repos/dem2 && just curl_api '{"function": "process_all_documents"}')

Delete all documents:

(cd repos/dem2 && just curl_api '{"function": "delete_all_documents"}')

Task Management

List all document processing tasks:

(cd repos/dem2 && just curl_api '{"function": "list_tasks"}')

Get specific task details:

(cd repos/dem2 && just curl_api '{"function": "get_task", "task_id": "uuid-here"}')

List failed tasks only:

(cd repos/dem2 && just curl_api '{"function": "list_failed_tasks"}')

Patient Management

List all patients:

(cd repos/dem2 && just curl_api '{"function": "list_patients"}')

Agent Session Management

Create a new agent session:

(cd repos/dem2 && just curl_api '{"function": "create_session", "name": "My Session"}')

Set/update session name:

(cd repos/dem2 && just curl_api '{"function": "set_session_name", "session_id": "uuid-here", "name": "New Name"}')

List all sessions:

(cd repos/dem2 && just curl_api '{"function": "list_sessions"}')

Agent Query

Query the medical agent:

(cd repos/dem2 && just curl_api '{"function": "query_agent", "query": "What is my cholesterol?"}')

Query with specific session:

(cd repos/dem2 && just curl_api '{"function": "query_agent", "query": "What is my cholesterol?", "session_id": "uuid-here"}')

Agent Diagnostics

Check agent dependencies:

(cd repos/dem2 && just curl_api '{"function": "check_agent_dependencies"}')

Validate agent configuration:

(cd repos/dem2 && just curl_api '{"function": "validate_agent_config"}')

Medical Catalog (Biomarker Enrichment)

Enrich biomarkers:

(cd repos/dem2 && just curl_api '{"function": "catalog_enrich", "names": ["ApoA-1", "Factor II"]}')

Check enrichment status:

(cd repos/dem2 && just curl_api '{"function": "catalog_enrich_status", "task_id": "uuid-here"}')

Search for biomarkers:

(cd repos/dem2 && just curl_api '{"function": "catalog_search", "names": ["cholesterol"], "limit": 5}')

Search by alias groups:

(cd repos/dem2 && just curl_api '{"function": "catalog_search_by_alias", "alias_groups": [["LDL"], ["HDL", "HDL-C"]]}')

Search derivative biomarkers (ratios, calculated values):

(cd repos/dem2 && just curl_api '{"function": "catalog_search_derivatives", "names": ["ApoB/ApoA-1"]}')

Enrich derivatives (ratios, sums, percentages):

(cd repos/dem2 && just curl_api '{"function": "catalog_enrich_derivatives", "names": ["ApoB/ApoA-1", "TC/HDL-C"]}')

List all derivatives (with pagination):

(cd repos/dem2 && just curl_api '{"function": "catalog_list_derivatives", "limit": 100, "offset": 0}')

Debug

Debug JSON argument structure:

(cd repos/dem2 && just curl_api '{"function": "debug_args", "names": ["test1", "test2"]}')

Common Workflows

List Available Test Documents

Before uploading documents for testing, list available test documents in the repository:

# List all test documents with full paths
just list-test-docs

# Output example:
# [
#   "pdf_tests/medical_records/.../Boston Heart July 2021.pdf",
#   "pdf_tests/medical_records/.../Dutch cortisol 9-01-25.pdf",
#   ...
# ]

Use this to:

Find available test documents before testing document processing
Get correct paths for upload functions
Identify specific documents for debugging (e.g., Dutch cortisol document for "Estrone (E1)" testing)

Upload and Process a Document

# First, list available test documents
just list-test-docs

# Upload document using path from list-test-docs
(cd repos/dem2 && just curl_api '{"function": "upload_file", "path": "pdf_tests/medical_records/Stuart Mcclure Medical Records (PRIVATE)/Dutch cortisol 9-01-25.pdf"}')

# Process the uploaded document (use the file_id from upload response)
(cd repos/dem2 && just curl_api '{"function": "process_document", "file_id": "file-id-from-upload"}')

Query Agent About Health Markers

# Query agent
(cd repos/dem2 && just curl_api '{"function": "query_agent", "query": "What is my latest cholesterol level?"}')

Check Task Processing Status

# List all tasks to find IDs
(cd repos/dem2 && just curl_api '{"function": "list_tasks"}')

# Get specific task details
(cd repos/dem2 && just curl_api '{"function": "get_task", "task_id": "abc-123-def"}')

Enrich and Validate Biomarkers

# Enrich biomarkers in catalog
(cd repos/dem2 && just curl_api '{"function": "catalog_enrich", "names": ["Total Cholesterol", "LDL", "HDL"]}')

# Search to verify they exist
(cd repos/dem2 && just curl_api '{"function": "catalog_search", "names": ["Total Cholesterol"], "limit": 5}')

How It Differs from Direct curl_api.sh Usage

Using just curl_api (recommended):

(cd repos/dem2 && just curl_api '{"function": "list_documents"}')

Direct script usage (lower-level):

(cd repos/dem2 && bash -c 'source scripts/curl_api.sh && dispatch "{\"function\": \"list_documents\"}"')

Benefits of just curl_api:

✅ Cleaner syntax (no need to source or call dispatch)
✅ Proper error handling via justfile
✅ Consistent working directory handling
✅ Part of documented justfile interface

Environment Variables

Default settings (defined in scripts/curl_api.sh):

PATIENT_FIRST_NAME=Stuart
PATIENT_LAST_NAME=McClure
PATIENT_DATE_OF_BIRTH=1969-03-03
AUTH_EMAIL=dbeal@numberone.ai
BACKEND_URL=http://localhost:8000/api/v1
FRONTEND_URL=http://localhost:3000

Override patient context:

PATIENT_FIRST_NAME=John PATIENT_LAST_NAME=Doe PATIENT_DATE_OF_BIRTH=1990-01-01 \
  (cd repos/dem2 && just curl_api '{"function": "list_documents"}')

Enable verbose curl output (for debugging):

CURL_VERBOSE=1 (cd repos/dem2 && just curl_api '{"function": "list_documents"}')

Error Handling

If a function doesn't exist:

$ (cd repos/dem2 && just curl_api '{"function": "nonexistent"}')
# ERROR: Unknown function: nonexistent
# Available functions: list_documents, upload_file, process_document, ...

If required arguments are missing:

$ (cd repos/dem2 && just curl_api '{"function": "upload_file"}')
# ERROR: Missing 'path' field in JSON
# Usage: {"function": "upload_file", "path": "path/to/file.pdf"}

When to Use curl_api

Use curl_api for:

✅ Quick API testing during development
✅ One-off administrative tasks (upload, delete, etc.)
✅ Debugging API endpoints and responses
✅ Validating authentication and patient context
✅ Scripting batch operations

Don't use curl_api for:

❌ Production operations (use proper API clients)
❌ Performance testing (use dedicated load testing tools)
❌ Automated testing (use pytest with proper fixtures)

Related Commands

Low-level curl_api.sh functions (not dispatched, but useful):

# Get patient ID and set context
(cd repos/dem2 && bash -c 'source scripts/curl_api.sh && _export_patient_context_id_internal && declare -p X_PATIENT_CONTEXT_ID')

# Call backend API with auth
(cd repos/dem2 && bash -c 'source scripts/curl_api.sh && _export_patient_context_id_internal && auth_backend "/graph-memory/medical/observations/grouped"')

See also:

repos/dem2/scripts/curl_api.sh - Complete function implementations
repos/dem2/justfile (line 271-347) - curl_api rule documentation
.claude/skills/machina-ui/SKILL.md - UI debugging with curl_api examples

Environment Variables

Authoritative Configuration

The single source of truth for environment variables is machina-meta/.env at the workspace root.

machina-meta/.env          # THE authoritative .env file

Important: Any .env files in subdirectories (repos/dem2/.env, repos/medical-catalog/.env, etc.) are temporary symlinks to the root .env file. Do not treat them as separate configuration files.

Environment File Hierarchy

File	Purpose	Scope
`repos/dem2/.env.example`	Template for all backend env vars (committed to git)	Reference
`machina-meta/.env`	Active local config (never committed)	Local dev
`.env.localhost`	Basic local reference (~29 exports)	Local dev
`.env.localhost.alen`	Dev reference with Langfuse/MCP (~72 exports)	Local dev
`.env.localhost.alen-dev-1`	Most complete reference (~108 exports)	Local dev
`.env.<namespace>.<deployment>`	Generated by `import_k8s_environment.py` from K8s	Remote debug
`.current_env`	Symlink to active `.env.<namespace>.<deployment>`	Remote debug

When adding new environment variables:

Add to repos/dem2/.env.example with documentation comments (blank secrets)
Add to your active .env file with real values
Integrate into k8s manifests (see Kubernetes Environment Configuration below)
If sensitive, add to Google Secret Manager + ExternalSecrets

Database Defaults (Docker Compose)

# PostgreSQL
POSTGRES_USER: postgres
POSTGRES_PASSWORD: demodemo
POSTGRES_DB: demodemo

# Neo4j
NEO4J_AUTH: neo4j/demodemo

# Redis
# No auth by default in dev

Kubernetes Environment Configuration

Architecture: Kustomize Base + Overlays

The k8s environment configuration uses Kustomize with a base + overlay pattern:

repos/dem2-infra/k8s/
├── base/                              # Shared across all environments
│   ├── app.yaml                       # Deployments (tusdi-api, tusdi-webui) + Services
│   ├── config.yaml                    # ConfigMaps (app-config, postgres-config)
│   ├── external-secrets.yaml          # ExternalSecrets (GSM → k8s Secrets)
│   └── kustomization.yaml            # Base resource list
└── overlays/
    ├── dev/
    │   ├── env-vars-patch.yaml        # ENV_FOR_DYNACONF=development, LANGFUSE_PROMPTS_LABEL=development
    │   └── kustomization.yaml         # namespace: tusdi-dev
    ├── staging/
    │   ├── env-vars-patch.yaml        # ENV_FOR_DYNACONF=staging, LANGFUSE_PROMPTS_LABEL=staging
    │   └── kustomization.yaml         # namespace: tusdi-staging
    └── preview/
        ├── env-vars-patch.yaml        # ENVIRONMENT=preview, auth URLs, LANGFUSE_PROMPTS_LABEL=preview-${PR_NUMBER}
        └── kustomization.yaml         # namespace: tusdi-preview-$PR_NUMBER

How Environment Variables Are Organized

Three mechanisms for injecting env vars into pods:

Mechanism	File	Use For	Example
Direct in Deployment	`base/app.yaml`	Non-sensitive config shared across envs	`DYNACONF_REDIS_DB__HOST`, `TZ`
ExternalSecrets	`base/external-secrets.yaml`	Sensitive values from Google Secret Manager	API keys, passwords
Overlay Patches	`overlays/<env>/env-vars-patch.yaml`	Per-environment overrides	`ENV_FOR_DYNACONF`, `LANGFUSE_PROMPTS_LABEL`

Complete tusdi-api Environment Variables

Direct values (base/app.yaml):

Variable	Value	Category
`FORWARDED_ALLOW_IPS`	`*`	Network
`ENVIRONMENT`	`${ENVIRONMENT}`	App
`DYNACONF_PG_DB__HOST`	`postgres`	PostgreSQL
`DYNACONF_PG_DB__PORT`	`5432`	PostgreSQL
`DYNACONF_PG_DB__NAME`	`tusdi_${ENVIRONMENT}`	PostgreSQL
`DYNACONF_PG_DB__USER`	`tusdi`	PostgreSQL
`DYNACONF_NEO4J_DB__HOST`	`neo4j`	Neo4j
`DYNACONF_NEO4J_DB__PORT`	`7687`	Neo4j
`DYNACONF_NEO4J_DB__USER`	`neo4j`	Neo4j
`DYNACONF_NEO4J_DB__NAME`	`neo4j`	Neo4j
`NEO4J_URI`	`bolt://neo4j:7687`	Neo4j
`NEO4J_USER`	`neo4j`	Neo4j
`DYNACONF_REDIS_DB__HOST`	`redis://redis:6379`	Redis
`DYNACONF_AUTH__FRONTEND_URL`	`https://${ENVIRONMENT}.${DNS_DOMAIN}`	Auth
`DYNACONF_AUTH__REDIRECT_URL`	`https://${ENVIRONMENT}.${DNS_DOMAIN}/api/v1/auth/google/callback`	Auth
`DYNACONF_AUTH__JWT_SECRET_KEY`	`xBkVoNf...`	Auth
`IMPERSONATE_SA`	`tusdi-nonprod-app@...`	GCP
`DYNACONF_TRACING__ENABLED`	`true`	Tracing
`DYNACONF_TRACING__OTEL_EXPORTER_OTLP_ENDPOINT`	`http://otel-collector.langfuse:4318`	Tracing
`DYNACONF_LANGFUSE__ENABLED`	`true`	Langfuse
`DYNACONF_LANGFUSE__TRACING__ENABLED`	`true`	Langfuse
`DYNACONF_LANGFUSE__BASE__HOST`	`http://langfuse-web.langfuse:3000`	Langfuse
`LANGFUSE_HOST`	`http://langfuse-web.langfuse:3000`	Langfuse SDK
`DYNACONF_LANGFUSE__HOST`	`http://langfuse-web.langfuse:3000`	Langfuse (dynaconf)
`LANGFUSE_PROMPTS_SOURCE`	`hybrid`	Prompt Mgmt
`LANGFUSE_PROMPTS_LABEL`	`production`	Prompt Mgmt
`LANGFUSE_PROMPTS_FALLBACK`	`true`	Prompt Mgmt
`TZ`	`UTC`	Misc
`ENV_FOR_DYNACONF`	`production`	App
`OTEL_PYTHON_URLLIB3_EXCLUDED_URLS`	`.sentry\\.io.`	Tracing

From ExternalSecrets (Google Secret Manager):

Variable	K8s Secret	GSM Key Pattern	Category
`DYNACONF_PG_DB__PASSWORD`	`postgres-secrets`	`tusdi-${ENVIRONMENT}-postgres-password`	PostgreSQL
`DYNACONF_NEO4J_DB__PASSWORD`	`neo4j-secrets`	`tusdi-${ENVIRONMENT}-neo4j-password`	Neo4j
`NEO4J_PASSWORD`	`neo4j-secrets`	`tusdi-${ENVIRONMENT}-neo4j-password`	Neo4j
`OPENAI_API_KEY`	`api-secrets`	`tusdi-${ENVIRONMENT}-openai-key`	LLM
`GEMINI_API_KEY`	`api-secrets`	`tusdi-${ENVIRONMENT}-gemini-key`	LLM
`SERPER_API_KEY`	`api-secrets`	`tusdi-${ENVIRONMENT}-serper-api-key`	Search
`VISION_AGENT_API_KEY`	`api-secrets`	`tusdi-${ENVIRONMENT}-vision-agent-api-key`	LLM
`GOOGLE_SEARCH_API_KEY`	`api-secrets`	`tusdi-${ENVIRONMENT}-google-search-api-key`	Search
`DYNACONF_GOOGLE_AUTH__CLIENT_ID`	`backend-google-oauth-secrets`	(managed separately)	Auth
`DYNACONF_GOOGLE_AUTH__CLIENT_SECRET`	`backend-google-oauth-secrets`	(managed separately)	Auth
`DYNACONF_CALENDAR__OAUTH_CLIENT_ID`	`backend-google-oauth-secrets`	(managed separately)	Auth
`DYNACONF_CALENDAR__OAUTH_CLIENT_SECRET`	`backend-google-oauth-secrets`	(managed separately)	Auth
`LANGFUSE_PUBLIC_KEY`	`langfuse-secrets`	`langfuse-public-key` (global)	Langfuse
`LANGFUSE_SECRET_KEY`	`langfuse-secrets`	`langfuse-secret-key` (global)	Langfuse
`DYNACONF_LANGFUSE__PUBLIC_KEY`	`langfuse-secrets`	`langfuse-public-key` (global)	Langfuse
`DYNACONF_LANGFUSE__SECRET_KEY`	`langfuse-secrets`	`langfuse-secret-key` (global)	Langfuse

Per-environment overrides (overlay patches):

Variable	Dev	Staging	Preview
`ENV_FOR_DYNACONF`	`development`	`staging`	(base: `production`)
`LANGFUSE_PROMPTS_LABEL`	`development`	`staging`	`preview-${PR_NUMBER}`
`ENVIRONMENT`	(base)	(base)	`preview`
`DYNACONF_AUTH__FRONTEND_URL`	(base)	(base)	`https://preview-${PR_NUMBER}.${DNS_DOMAIN}`
`DYNACONF_AUTH__REDIRECT_URL`	(base)	(base)	`https://oauth-callback.${DNS_DOMAIN}/callback`
`OAUTH_PR_NUMBER`	-	-	`pr-${PR_NUMBER}`

Langfuse Variable Naming

The backend reads Langfuse configuration through two parallel paths:

Langfuse SDK env vars (LANGFUSE_*) - Read directly by the Langfuse Python SDK and CLI tools
Dynaconf env vars (DYNACONF_LANGFUSE__*) - Read by dynaconf into config.toml structure, then propagated to SDK by LangfuseIntegration

Both must be set in k8s for full compatibility:

LANGFUSE_HOST + DYNACONF_LANGFUSE__HOST + DYNACONF_LANGFUSE__BASE__HOST
LANGFUSE_PUBLIC_KEY + DYNACONF_LANGFUSE__PUBLIC_KEY
LANGFUSE_SECRET_KEY + DYNACONF_LANGFUSE__SECRET_KEY

The DYNACONF_LANGFUSE__BASE__HOST is the legacy path (pre-refactor); DYNACONF_LANGFUSE__HOST is the newer config path. Both are set to the same value for compatibility.

Hybrid Prompt Management per Environment

Environment	`LANGFUSE_PROMPTS_SOURCE`	`LANGFUSE_PROMPTS_LABEL`	Behavior
Base (production)	`hybrid`	`production`	Try Langfuse production prompts, fallback to local
Dev	`hybrid`	`development`	Try Langfuse development prompts, fallback to local
Staging	`hybrid`	`staging`	Try Langfuse staging prompts, fallback to local
Preview	`hybrid`	`preview-${PR_NUMBER}`	Try Langfuse PR-specific prompts, fallback to local

LANGFUSE_PROMPTS_FALLBACK=true ensures all environments gracefully degrade to local config.yml files if Langfuse is unreachable.

How to Add New Environment Variables

Non-Sensitive Variables (configs, feature flags)

Add to .env.example in repos/dem2/.env.example with comments
Add to base app.yaml in repos/dem2-infra/k8s/base/app.yaml:
```
- name: MY_NEW_CONFIG
  value: "default-value"
```
If environment-specific, add overrides in overlays/<env>/env-vars-patch.yaml

Sensitive Variables (API keys, passwords, tokens)

Add to .env.example with blank value
Create secret in Google Secret Manager following naming convention:
```
tusdi-${ENVIRONMENT}-<descriptive-key-name>
```
e.g., tusdi-dev-openai-key, tusdi-staging-openai-key Note: Some secrets are global (e.g., langfuse-public-key) — not all follow the per-env pattern

Add ExternalSecret entry in repos/dem2-infra/k8s/base/external-secrets.yaml:

---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: my-new-secrets
spec:
  refreshInterval: 1m
  secretStoreRef:
    name: gcpsm-secret-store
    kind: SecretStore
  target:
    name: my-new-secrets
    creationPolicy: Owner
  data:
  - secretKey: MY_SECRET_KEY
    remoteRef:
      key: tusdi-${ENVIRONMENT}-my-secret-key

Reference in app.yaml:

- name: MY_SECRET_KEY
  valueFrom:
    secretKeyRef:
      name: my-new-secrets
      key: MY_SECRET_KEY

Update supporting files (noted in external-secrets.yaml header):
- terraform/secrets.tf - Add the secret resource
- deploy-k8s.sh - Add secret to validate_secrets() function

Checklist: `.env.example` → k8s Integration

When .env.example changes, follow this workflow:

□ Identify new variables from git diff on .env.example
□ Classify each as sensitive (secret) or non-sensitive (config)
□ For secrets: create GSM entries + ExternalSecret + secretKeyRef in app.yaml
□ For configs: add directly to app.yaml with appropriate default
□ Determine if variable needs per-environment overrides → overlay patches
□ Verify variable naming matches what the application code reads
□ Check for dual naming (e.g., LANGFUSE_* + DYNACONF_LANGFUSE__*)
□ Update terraform/secrets.tf and deploy-k8s.sh if adding new secrets

ExternalSecret Groups

K8s Secret Name	GSM Key Prefix	Variables
`postgres-secrets`	`tusdi-${ENV}-postgres-*`	`POSTGRES_PASSWORD`
`neo4j-secrets`	`tusdi-${ENV}-neo4j-*`	`NEO4J_PASSWORD`
`api-secrets`	`tusdi-${ENV}-*-key`	`OPENAI_API_KEY`, `GEMINI_API_KEY`, `SERPER_API_KEY`, `VISION_AGENT_API_KEY`, `GOOGLE_SEARCH_API_KEY`
`backend-google-oauth-secrets`	(managed separately)	`client-id`, `client-secret`
`langfuse-secrets`	`langfuse-*` (global, not per-env)	`LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`

Best Practices

Starting Development

Always use workspace-level just commands from machina-meta root
Use just dev-status to verify services before running code
Check for port conflicts before starting: lsof -i :8000

Stopping Development

Use just dev-down to stop all services cleanly
Don't use docker kill - it doesn't run shutdown hooks
Check nothing is left running: docker ps

Resetting Data

Stop with just dev-down
Remove volumes: docker volume prune -f
Start fresh: just dev-up

Updating Images

# Pull latest images
docker compose pull

# Rebuild local images
docker compose build --no-cache

# Update and restart
just dev-restart

Integration with Other Skills

machina-git

Use machina-git for all git operations. Docker skill handles containers only.

kubernetes

Kubernetes skill handles production cluster operations. Docker images built here are deployed via Kubernetes.

machina-ui

For frontend debugging, use machina-ui skill which can interact with containerized services.

Pull Request Creation

Overview

This section provides the authoritative process for creating pull requests in the dem2, dem2-webui, and dem2-infra repositories.

Note: This section is in machina-docker (not machina-git) because PR creation uses the gh CLI for GitHub operations, while machina-git handles core git operations (commit, push, status). For git commands below, use the machina-git skill workflow.

PR Creation Process

Step 1: Identify merge base and examine ALL commits

# Find merge base (usually origin/dev for feature branches)
cd repos/dem2 && git fetch origin && git merge-base origin/dev HEAD

# Get comprehensive branch stats
BASE=$(git merge-base origin/dev HEAD)
echo "=== Branch Stats ==="
echo "Commits: $(git rev-list --count $BASE..HEAD)"
echo "Files: $(git diff --stat $BASE..HEAD | tail -1)"
echo "Date range: $(git log --format='%as' $BASE..HEAD | tail -1) to $(git log --format='%as' -1 HEAD)"

# List ALL commits from merge base to HEAD
git log --oneline $BASE..HEAD

# Get the FULL diff (CRITICAL - examine ALL changes, not just commit messages)
git diff $BASE..HEAD

Step 2: Analyze changes by category

Group all changes into categories:

Bug Fixes: Error corrections, crash fixes
Features: New functionality
Refactoring: Code improvements without behavior change
Performance: Speed/resource optimizations
Documentation: Docs, comments, README updates
Testing: Test additions/modifications
Infrastructure: CI/CD, build, deployment changes

Step 3: Generate PR using gh CLI

cd repos/dem2 && gh pr create --base dev --head <branch-name> --title "<title>" --body "<body>"

PR Summary Template

Use this exact structure for PR summaries:

## Summary

<One paragraph overview describing the primary purpose and key achievements of this PR.>

### Key Changes

| Category | Description |
|----------|-------------|
| **<Category 1>** | <Brief description of what changed> |
| **<Category 2>** | <Brief description of what changed> |

---

### Key Changes

#### <Feature/Fix Area 1>
- ✅ <Completed item with specific detail>
- ✅ <Completed item with specific detail>

#### <Feature/Fix Area 2>
- ✅ <Completed item with specific detail>

### Technical Details

<If applicable, include tables for:>
- Performance improvements (Before/After/Impact)
- Architectural decisions (Decision/Rationale/Outcome)
- Breaking changes (if any)

## Test Plan

- [x] <Specific test performed>
- [x] <Specific test performed>
- [x] <Verification method used>

## Related Issues

- Fixes #<issue> (if applicable)
- Related to #<issue> (if applicable)

PR Summary Generation Prompt

When generating a PR summary, follow these rules:

Read ALL diffs: Examine every file changed from merge base to HEAD
Categorize changes: Group by feature area, not by file
Be specific: Include actual function names, file paths, metrics
Use tables: For comparisons, metrics, architectural decisions
Include evidence: Reference specific commits, test results, measurements
Quantify impact: "94% cost reduction" not "significant savings"
Test plan must be real: Only include tests actually performed
No fluff: Every line should convey information

DO NOT:

Include generic statements like "improved code quality"
List files without explaining what changed
Include untested items in the test plan
Add performance claims without measurements
Use vague language like "various improvements"

DO:

Start with the most important/impactful changes
Group related changes together
Include specific function/class names
Reference commit hashes for key changes
Explain WHY architectural decisions were made

Example Workflow

# 1. Get merge base and stats
cd repos/dem2 && git fetch origin
BASE=$(git merge-base origin/dev HEAD)
echo "Commits: $(git rev-list --count $BASE..HEAD)"
echo "Files: $(git diff --stat $BASE..HEAD | tail -1)"
echo "Date range: $(git log --format='%as' $BASE..HEAD | tail -1) to $(git log --format='%as' -1 HEAD)"

# 2. List all commits (READ each one for context)
git log --oneline $BASE..HEAD

# 3. Get full diff (CRITICAL - analyze the actual code changes)
git diff $BASE..HEAD

# 4. Create PR with summary
gh pr create --base dev --title "feat: <concise title>" --body "$(cat <<'EOF'
## Summary
<summary based on diff analysis>
...
EOF
)"

Creating PRs for Different Repos

dem2 (Backend):

cd repos/dem2 && gh pr create --base dev --head <branch>

dem2-webui (Frontend):

cd repos/dem2-webui && gh pr create --base dev --head <branch>

dem2-infra (Infrastructure):

cd repos/dem2-infra && gh pr create --base main --head <branch>

Reference Files

This skill includes detailed documentation in references/:

compose-files.md - Complete documentation of all 13 docker-compose files
profiles.md - Profile system (dev, test, prod) detailed guide
ports.md - All port mappings with conflict detection
volumes.md - Volume management, backup, restore
troubleshooting.md - Common issues and solutions
scripts.md - Helper script documentation (dev_stack.py, etc.)

Use Read to access specific reference files when detailed information is needed.

Additional references:

user-workflow-preferences.md — .current_env for API testing, kubectl locally, preview environment mappings

Name	machina-docker
Description	Docker development environment for machina-meta workspace. Use for container management, development stacks, database services, health checks, volume management, and infrastructure. The single authoritative source for all Docker operations.

machina-docker

SKILL.md

name: machina-docker description: Docker development environment for machina-meta workspace. Use for container management, development stacks, database services, health checks, volume management, and infrastructure. The single authoritative source for all Docker operations.

Docker Skill

When to Use This Skill

Development Environment Management

Service Operations

Database Operations

Specialized Stacks

Infrastructure & Deployment

Kubernetes Environment Configuration

Quick Reference

Primary Development Commands

How the Preview CI/CD Pipeline Works

Step-by-Step: Deploy Backend (tusdi-api) to Preview

Step-by-Step: Deploy Frontend (tusdi-webui) to Preview

Step-by-Step: Deploy Both Backend and Frontend

Step 3: Monitor Deployment Progress

Common Deployment Scenarios

Troubleshooting Deployments

Important Notes

Syncing Local dem2-infra After CI Dispatch

Deprecated Commands (in child repos)

Docker Compose Files Map

Profile System

Port Mappings

Development Ports (localhost)

Langfuse Stack Ports

Specialized Ports

Common Usage Patterns

Pattern 1: Full Stack Development (Recommended)

Pattern 2: Backend Development Only

Pattern 3: Frontend Development Only

Pattern 4: Clean Restart with Fresh Data

Pattern 5: Local LLM Inference (Ollama)

Pattern 6: Local Langfuse Observability

Pattern 7: Remote Debugging Tools

Pattern 8: gcloud-admin DevOps

Pattern 9: Remote GKE Environment Access

Pattern 10: ArgoCD Application Sync and Management

Volume Management

Named Volumes

Qdrant Snapshot Restore

Clearing Volumes

Health Checks

dev_stack.py Orchestration

Service Health Endpoints

Sanity Check Suite

Manual Health Checks

Troubleshooting

Port Already in Use

Container Won't Start

Database Connection Failed

Neo4j Won't Start

Qdrant Snapshots Not Restored

Log Extraction

API Testing with curl_api

Overview

How It Works

Basic Usage

Available Function Categories

Document Management

Task Management

Patient Management

Agent Session Management

Agent Query

Agent Diagnostics

Medical Catalog (Biomarker Enrichment)

Debug

Common Workflows

List Available Test Documents

Upload and Process a Document

Query Agent About Health Markers

Check Task Processing Status

Enrich and Validate Biomarkers

How It Differs from Direct curl_api.sh Usage

Environment Variables

Error Handling

When to Use curl_api

Related Commands

Checklist: `.env.example` → k8s Integration