Agent Skill
2/7/2026

machina-docker

Docker development environment for machina-meta workspace. Use for container management, development stacks, database services, health checks, volume management, and infrastructure. The single authoritative source for all Docker operations.

N
numberone
0GitHub Stars
1Views
npx skills add NumberOne-AI/machina-meta

SKILL.md

Namemachina-docker
DescriptionDocker development environment for machina-meta workspace. Use for container management, development stacks, database services, health checks, volume management, and infrastructure. The single authoritative source for all Docker operations.

name: machina-docker description: Docker development environment for machina-meta workspace. Use for container management, development stacks, database services, health checks, volume management, and infrastructure. The single authoritative source for all Docker operations.

Docker Skill

Comprehensive Docker environment management for the machina-meta workspace. This is the single authoritative source for all Docker operations.

When to Use This Skill

Development Environment Management

  • Starting/stopping the development stack
  • Checking service status and health
  • Managing database containers (PostgreSQL, Neo4j, Redis, Qdrant)
  • Running full-stack development (frontend + backend + databases)

Service Operations

  • Building Docker images
  • Running specific service containers
  • Debugging container issues
  • Log extraction and analysis

Database Operations

  • Managing development databases
  • Snapshot/restore operations (Qdrant)
  • Volume management and data persistence
  • Database migrations in containers

Specialized Stacks

  • Langfuse observability (local and production)
  • Local LLM inference (Ollama, vLLM with GPU)
  • Remote debugging tools (pgAdmin, Neo4j Browser)
  • gcloud-admin DevOps container

Infrastructure & Deployment

  • Gateway infrastructure (ngrok, nginx)
  • Monitoring stacks (Langfuse + OTEL)
  • LLM experiment environments

Kubernetes Environment Configuration

  • Adding new environment variables to k8s deployments
  • Integrating .env.example changes into Kustomize manifests
  • Managing secrets via ExternalSecrets + Google Secret Manager
  • Per-environment configuration via overlay patches

Quick Reference

Primary Development Commands

All Docker commands should be run from the machina-meta workspace root.

Note: Justfile rules in repos/dem2, repos/dem2-webui, and repos/medical-catalog are deprecated. Use workspace-level commands instead.

Workspace Level (machina-meta root):

CommandPurposeUnderlying Operation
just dev-upStart full stack (all services)./scripts/dev_stack.py up
just dev-downStop all services./scripts/dev_stack.py down
just dev-statusCheck all service health./scripts/dev_stack.py status
just dev-restartRebuild and restartdocker compose --profile dev up -d --build
just dev-checkRun sanity check testsNon-destructive verification suite

gcloud-admin DevOps:

CommandPurposeUnderlying Operation
just gcloud-admin::shellInteractive DevOps shelldocker compose run --rm gcloud-admin
just gcloud-admin::kubectl <args>Run kubectldocker compose run --rm gcloud-admin kubectl <args>
just gcloud-admin::helm <args>Run helmdocker compose run --rm gcloud-admin helm <args>
just gcloud-admin::k9sCluster TUIdocker compose run --rm gcloud-admin k9s
just gcloud-admin::argocd <args>Run ArgoCD CLIdocker compose run --rm gcloud-admin argocd <args>
just gcloud-admin::preview-info <identifier>Get preview deployment infoShows tags, commits, PR status, ArgoCD health

Preview Environment Info:

Check the current state of a preview deployment using any identifier:

# By GKE namespace
just gcloud-admin::preview-info --gke-namespace tusdi-preview-92

# By git tag
just gcloud-admin::preview-info --git-tag preview-dbeal-docproc-dev

# By ArgoCD app name
just gcloud-admin::preview-info --argocd-app preview-pr-92

# By infra branch
just gcloud-admin::preview-info --infra-branch preview/dbeal-docproc-dev

# By infra PR number
just gcloud-admin::preview-info --pr 92

# By git branch
just gcloud-admin::preview-info --git-branch feature/dbeal-docproc-dev

# Output formats: terminal, json, markdown (default: markdown)
just gcloud-admin::preview-info --gke-namespace tusdi-preview-92 --format json

Output includes:

  • Preview ID resolved from identifier
  • Backend/Frontend tag commits and dates
  • Infrastructure branch and PR status
  • ArgoCD sync and health status
  • GitHub workflow status for all repos

Updating Preview Deployments (Deploy a Branch Commit to Preview):

This is the complete, authoritative process for deploying code changes to a GKE preview environment. Use this whenever the user says "deploy to preview-XX".

How the Preview CI/CD Pipeline Works

The deployment pipeline is fully automated once a preview tag is pushed:

1. Developer pushes tag `preview-<branch>` to dem2 (or dem2-webui)
       ↓
2. GitHub Actions `pr-preview-build.yml` triggers on `push: tags: preview-*`
       ↓
3. CI builds Docker image, pushes to Artifact Registry with tag:
   `<tag-name>-<full-commit-sha>`
   Example: `preview-alen-dev-1-ae99b3911e7a564c1470a116e3097792342f665b`
       ↓
4. CI dispatches `preview-update` event to dem2-infra repository
       ↓
5. dem2-infra workflow updates `kustomization.yaml` with new image tag
       ↓
6. ArgoCD detects the infra change and syncs the deployment
       ↓
7. New pod starts with the updated image in tusdi-preview-XX namespace

Image tag format: <git-tag-name>-<full-40-char-commit-sha>

  • For tag pushes: preview-alen-dev-1-ae99b3911e7a564c1470a116e3097792342f665b
  • For PR builds: pr-92-ae99b3911e7a564c1470a116e3097792342f665b

Step-by-Step: Deploy Backend (tusdi-api) to Preview

# Step 1: Ensure your code is committed and pushed to the branch
# (Use machina-git skill for this)
cd repos/dem2 && git log -1 --oneline
# Verify HEAD is the commit you want to deploy

# Step 2: Tag current HEAD with preview-<branch-name> and force-push
# This triggers the CI pipeline automatically
cd repos/dem2 && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force

# Example for alen-dev-1 branch:
cd repos/dem2 && git tag -f preview-alen-dev-1 && git push origin preview-alen-dev-1 --force

Shortcut using justfile:

# Equivalent to the above (tags + force-pushes in one command)
(cd repos/dem2 && just preview <branch-name>)

# Example:
(cd repos/dem2 && just preview alen-dev-1)

Step-by-Step: Deploy Frontend (tusdi-webui) to Preview

# Same pattern as backend
cd repos/dem2-webui && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force

# Or using justfile:
(cd repos/dem2-webui && just preview <branch-name>)

Step-by-Step: Deploy Both Backend and Frontend

# Deploy both at once
(cd repos/dem2 && just preview-both <branch-name>)

# Or manually:
cd repos/dem2 && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force
cd repos/dem2-webui && git tag -f preview-<branch-name> && git push origin preview-<branch-name> --force

Step 3: Monitor Deployment Progress

# Check CI workflow status (wait for image build to complete)
cd repos/dem2 && gh run list --limit 3

# Watch a specific run
cd repos/dem2 && gh run watch <run-id>

# Check ArgoCD sync status
just gcloud-admin::preview-info --gke-namespace tusdi-preview-<XX>

# Watch pods for rollout
kubectl get pods -n tusdi-preview-<XX> -w

# Check pod logs after deployment
kubectl logs -n tusdi-preview-<XX> -l app.kubernetes.io/component=api --tail=50

Common Deployment Scenarios

ScenarioCommand
Deploy backend onlycd repos/dem2 && git tag -f preview-<branch> && git push origin preview-<branch> --force
Deploy frontend onlycd repos/dem2-webui && git tag -f preview-<branch> && git push origin preview-<branch> --force
Deploy both(cd repos/dem2 && just preview-both <branch>)
Check build statuscd repos/dem2 && gh run list --limit 3
Check deployment healthkubectl get pods -n tusdi-preview-<XX>
Force ArgoCD syncjust gcloud-admin::argocd app sync argocd/preview-pr-<XX>

Troubleshooting Deployments

IssueSolution
CI not triggeredVerify tag was pushed: cd repos/dem2 && git ls-remote --tags origin | grep preview-<branch>
Image not foundCheck CI run logs: cd repos/dem2 && gh run list --limit 3
ArgoCD not syncingManual sync: just gcloud-admin::argocd app sync argocd/preview-pr-<XX>
Pod CrashLoopBackOffCheck logs: kubectl logs -n tusdi-preview-<XX> -l app.kubernetes.io/component=api --tail=100
Old image still runningDelete deployment to force recreation: kubectl delete deployment tusdi-api -n tusdi-preview-<XX> then ArgoCD sync

Important Notes

  • Only backend changes need backend deployment. If you only changed dem2, you only need to tag/push dem2.
  • The tag must match the branch name pattern used when the preview was originally created.
  • Force-push (--force) is expected for preview tags — they are mutable pointers.
  • CI build takes ~3-5 minutes. ArgoCD sync adds ~1-2 minutes after that.
  • The kustomization.yaml in dem2-infra is updated automatically by the CI dispatch — you do NOT need to manually edit it.

Syncing Local dem2-infra After CI Dispatch

When the CI pipeline updates kustomization.yaml in dem2-infra, it force-pushes to the preview/<branch> branch on the remote. This means your local preview/<branch> branch diverges from the remote.

You MUST sync your local branch before making any further infra changes:

# After CI build completes and dispatches to dem2-infra:
cd repos/dem2-infra && git pull --rebase origin preview/<branch-name>

# Example for alen-dev-1:
cd repos/dem2-infra && git pull --rebase origin preview/alen-dev-1

Why --rebase is required:

  • The CI force-pushes a new commit to the remote branch with the updated image tag
  • A regular git pull (merge) would create unnecessary merge commits
  • git pull --rebase replays any local commits on top of the CI's update
  • If your only local commit was a previous image tag update, it will be skipped automatically (no conflict)

Common scenarios:

ScenarioWhat Happens
No local changes to infragit pull --rebase fast-forwards cleanly
Local infra changes (env vars, patches)Local commits are rebased on top of CI's image tag update
Local image tag edit conflicts with CIRebase conflict — resolve by accepting the CI version (theirs), since the CI tag is correct

Do NOT manually edit kustomization.yaml image tags — always let the CI handle it via the tag-push workflow. If you need to verify the current image tag after pull:

cd repos/dem2-infra && grep newTag k8s/overlays/preview/kustomization.yaml

Deprecated Commands (in child repos)

The following patterns are deprecated and should be migrated to workspace-level commands:

Deprecated CommandReplacement
(cd repos/dem2 && just dev-env-up)just dev-up (starts full stack)
(cd repos/dem2 && just dev-env-down)just dev-down
(cd repos/dem2 && just med-api-up)just dev-up
(cd repos/medical-catalog && just dev-env-up)just dev-up
(cd repos/medical-catalog && just docker-build)(pending migration)

Docker Compose Files Map

FileLocationPurposeProfile
docker-compose.yamlRootMain workspace stack (primary)dev
docker-compose.yamlrepos/dem2/infrastructure/Backend dev environmentdev, test
docker-compose.langfuse.local.yamlrepos/dem2/infrastructure/Local Langfuse observability-
docker-compose.ymlrepos/dem2/infrastructure/remote-debug/pgAdmin, Neo4j debug-
docker-compose.qdrant.yamlrepos/dem2/services/indicators-catalog/Standalone Qdrant-
docker-compose.yamlrepos/medical-catalog/infra/Catalog Qdrant (port 16333)dev
docker-compose.yamlgcloud-admin/DevOps admin container-
docker-compose.ngrok-nginx.yamlrepos/dem2-infra/.../gateway/Public gateway tunnel-
docker-compose.langfuse.yamlrepos/dem2-infra/.../monitoring/Production Langfuse + OTELmonitoring
docker-compose.ollama.ymlrepos/dem2-infra/.../experiment-setup/Local LLM with GPU-
docker-compose.vllm.ymlrepos/dem2-infra/.../experiment-setup/vLLM inference server-

See references/compose-files.md for detailed documentation of each file.


Profile System

All services use the unified dev profile:

ProfileServicesUse Case
devAll services (databases + backend + frontend + catalog)Full stack development

Starting Services:

# Full stack (all services)
docker compose --profile dev up -d

# Or use the just command (recommended)
just dev-up

Note: The --profile dev flag is required for up, down, restart, and build operations. It is NOT required for querying running containers (ps, logs, stats, exec).

See references/profiles.md for detailed profile documentation.


Port Mappings

Development Ports (localhost)

PortServiceStackHealth Check
3000Frontend (Next.js)machina-metahttp://localhost:3000
5432PostgreSQLmachina-metapg_isready -h localhost
5540RedisInsightmachina-metahttp://localhost:5540
6333Qdrant REST APImachina-metahttp://localhost:6333/healthz
6334Qdrant Web UImachina-metahttp://localhost:6334/dashboard
6379Redismachina-metaredis-cli ping
7474Neo4j Browsermachina-metahttp://localhost:7474
7687Neo4j Boltmachina-meta(used by applications)
8000Backend APImachina-metahttp://localhost:8000/docs
8001Medical Catalogmachina-metahttp://localhost:8001/health

Langfuse Stack Ports

PortServicePurpose
3003Langfuse WebObservability UI
3030Langfuse WorkerBackground processing
5433Langfuse PostgreSQLLangfuse database
6380Langfuse RedisLangfuse cache
8123ClickHouse HTTPAnalytics queries
9090MinIO APIS3-compatible storage
9091MinIO ConsoleMinIO admin UI

Specialized Ports

PortServicePurpose
5050pgAdminPostgreSQL admin UI (remote-debug)
11434OllamaLocal LLM inference
16333Qdrant (catalog)Medical catalog vector store
17474Neo4j (remote)Remote debugging Neo4j

See references/ports.md for complete port documentation.


Common Usage Patterns

Pattern 1: Full Stack Development (Recommended)

Start everything from workspace root:

# Start full stack
just dev-up

# Check status
just dev-status

# When done
just dev-down

Pattern 2: Backend Development Only

Start databases, run backend locally:

# Start full stack (includes databases)
just dev-up

# Stop the containerized backend to run locally
docker stop machina-meta-backend-1

# Run backend locally
(cd repos/dem2 && just run)

# When done
just dev-down

Pattern 3: Frontend Development Only

# Start full stack
just dev-up

# Or just start frontend locally if backend is running
(cd repos/dem2-webui && pnpm dev)

Pattern 4: Clean Restart with Fresh Data

Reset databases and reload fixtures:

# Stop and remove volumes
just dev-down
docker volume prune -f  # Warning: removes all unused volumes

# Start fresh
just dev-up

Pattern 5: Local LLM Inference (Ollama)

# Start Ollama with GPU
(cd repos/dem2-infra/infrastructure/docker/google_cloud/experiment-setup && \
  docker compose -f docker-compose.ollama.yml up -d)

# Pull a model
docker exec -it ollama ollama pull llama3.2

# Test
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "prompt": "Hello"}'

Pattern 6: Local Langfuse Observability

# Start Langfuse stack
(cd repos/dem2/infrastructure && docker compose -f docker-compose.langfuse.local.yaml up -d)

# Access UI at http://localhost:3003

# Stop
(cd repos/dem2/infrastructure && docker compose -f docker-compose.langfuse.local.yaml down)

Pattern 7: Remote Debugging Tools

# Start pgAdmin and Neo4j browser
(cd repos/dem2/infrastructure/remote-debug && docker compose up -d)

# pgAdmin: http://localhost:5050
# Neo4j: http://localhost:17474

Pattern 8: gcloud-admin DevOps

# First-time setup
just gcloud-admin::setup-nonprod

# Interactive shell
just gcloud-admin::shell

# Run kubectl commands
just gcloud-admin::kubectl get pods -n argocd

# Interactive cluster UI
just gcloud-admin::k9s

Pattern 9: Remote GKE Environment Access

This is the unified workflow for accessing remote GKE environments (preview, dev, staging).

CRITICAL: Local Docker vs Remote GKE — Mutually Exclusive

Port-forwarded remote services bind to the SAME ports as local Docker containers (5432, 7474, 8000, etc.). You MUST choose one:

ModeDescriptionPort Forwarding Required
Local Dockerjust dev-up runningNO — services already on localhost
Remote GKEPoint to a preview/dev/staging namespaceYES — must just dev-down first

Step 1: Determine current mode

# Check if local Docker containers are running
just dev-status
  • If local containers are running and you want local dev: no port forwarding needed, skip to Step 5 for API usage.
  • If you need remote GKE access: proceed to Step 2.

Step 2: Stop local Docker (required for remote access)

# Stop all local containers to free ports
just dev-down

Step 3: Import environment from Kubernetes

Extract all environment variables (configs, ConfigMaps, Secrets) from a remote deployment and generate a local .env file with K8s hostnames rewritten to localhost for port-forwarded access.

# Import environment from a k8s deployment
./scripts/import_k8s_environment.py import -n <NAMESPACE> -d <DEPLOYMENT>

# Examples:
./scripts/import_k8s_environment.py import -n tusdi-preview-99 -d tusdi-api
./scripts/import_k8s_environment.py import -n tusdi-staging -d tusdi-api
./scripts/import_k8s_environment.py import -n tusdi-dev -d tusdi-api

Output: Creates .env.<NAMESPACE>.<DEPLOYMENT> (e.g., .env.tusdi-preview-99.tusdi-api) with:

  • All direct values, ConfigMap values, and Secret values resolved
  • K8s service hostnames (postgres, neo4j, redis, etc.) rewritten to localhost
  • URL variables (redis://redis:6379, bolt://neo4j:7687) rewritten to localhost equivalents
  • File permissions set to 600 (owner read/write only)

Switch active environment:

.current_env is a symlink that points to the active environment file. It acts as a context pointer — switching it changes which GKE environment all tools (curl_api, neo4j-query.py, etc.) operate against.

# Point .current_env to the imported environment (overwrites any previous symlink)
ln -sfn .env.tusdi-preview-99.tusdi-api .current_env

If the .env.<NAMESPACE>.<DEPLOYMENT> file doesn't exist yet (or is out of date), import it first using Step 3 above, then create the symlink.

Switching between environments:

# Switch to preview-99
ln -sfn .env.tusdi-preview-99.tusdi-api .current_env

# Switch to staging
ln -sfn .env.tusdi-staging.tusdi-api .current_env

# Verify which environment is active
ls -la .current_env

Compare environments (detect drift after re-import):

./scripts/import_k8s_environment.py compare .env.old .env.new
./scripts/import_k8s_environment.py compare .env.old .env.new --markdown
./scripts/import_k8s_environment.py compare .env.old .env.new --json

Additional import options:

FlagPurpose
-c <name>Select specific container (default: first)
-o <path>Custom output path
--no-commentsOmit source comments
--no-localhostDisable hostname rewriting
--jsonOutput as JSON instead of .env file
-qQuiet mode

Step 4: Port-forward all services

ALWAYS use port_forward_service.py — NEVER run individual kubectl port-forward commands.

Launch the port-forward script in a dedicated tmux pane so its output stays visible without cluttering the main terminal:

# Forward ALL services in a namespace (recommended)
# Creates a small (~10 line) pane at the top, focus stays on current pane
# Capture the pane ID for later use (e.g., to kill it)
PF_PANE=$(tmux split-window -d -b -v -l 10 -P -F '#{pane_id}' \
  "./scripts/port_forward_service.py '{\"port_forward\": [{\"namespace\": \"<NAMESPACE>\", \"service_name\": \".*\"}]}'")

# Example: forward all tusdi-preview-99 services
PF_PANE=$(tmux split-window -d -b -v -l 10 -P -F '#{pane_id}' \
  "./scripts/port_forward_service.py '{\"port_forward\": [{\"namespace\": \"tusdi-preview-99\", \"service_name\": \".*\"}]}'")

# Forward specific services only
PF_PANE=$(tmux split-window -d -b -v -l 10 -P -F '#{pane_id}' \
  "./scripts/port_forward_service.py '{\"port_forward\": [{\"namespace\": \"tusdi-preview-99\", \"service_name\": \"neo4j\"}, {\"namespace\": \"tusdi-preview-99\", \"service_name\": \"postgres\"}]}'")

tmux flags: -d keeps focus on the current pane, -b places the new pane above (top), -v splits vertically, -l 10 sets height to 10 lines, -P -F '#{pane_id}' prints the new pane ID (captured into $PF_PANE).

The <NAMESPACE> should match the namespace shown in the header of .current_env.

Script details:

  • service_name supports regex patterns (".*" matches all services in the namespace)
  • Multi-port services (e.g., Neo4j: 7474 + 7687) forward all ports automatically
  • Automatic restart when port-forwards drop
  • Uses K8s targetPort as the local port (standard ports: 5432, 7474, 8000, etc.)
  • View JSON schema: ./scripts/port_forward_service.py --schema

State file: While running, the script writes /var/run/user/<uid>/port_forward_service.json containing the PID and all forwarded services. The file is automatically removed on exit (including tmux kill-pane). Read it to check which services are currently forwarded:

cat /var/run/user/$(id -u)/port_forward_service.json | jq

Stopping port-forwards:

# Kill the port-forward pane using the captured pane ID
tmux kill-pane -t "$PF_PANE"

# Or kill the process directly (from any pane)
pkill -f port_forward_service

Step 5: Use the environment

# Source .current_env and call backend API
(. .current_env && cd repos/dem2 && just curl_api '{"function": "list_documents"}') | jq

# Query observations
(. .current_env && cd repos/dem2 && just curl_api '{"function": "get_observations_grouped", "per_type_values_limit": 3}') | jq

# Filter specific biomarker
(. .current_env && cd repos/dem2 && just curl_api '{"function": "get_observations_grouped", "per_type_values_limit": 3}') | jq '.items[]|select(.observation_type.display_name=="Folate").values'

# Query remote Neo4j directly (credentials come from .current_env)
(. .current_env && ./scripts/neo4j-query.py --format json "MATCH (n:PatientNode) RETURN n.first_name, n.last_name LIMIT 5")

Authentication scripts (called automatically by curl_api):

  • user_manager.py — Generates JWT tokens for API authentication
    • uv run scripts/user_manager.py user token --export — outputs export AUTH_HEADER="Bearer ..."
    • uv run scripts/user_manager.py user token --export-cookie — for UI/browser authentication
  • curl_api.sh — JSON dispatch system; handles JWT tokens, patient context headers, respects BACKEND_URL

Common namespaces:

NamespaceEnvironment
tusdi-devDevelopment
tusdi-stagingStaging
tusdi-preview-<id>Preview (e.g., tusdi-preview-99)

Note: Always use kubectl directly on the host machine for port-forwarding. The gcloud-admin container has networking limitations that prevent port-forwarded ports from being accessible on the host.

Pattern 10: ArgoCD Application Sync and Management

For syncing, monitoring, and troubleshooting ArgoCD deployments on GKE preview environments.

Prerequisites:

  • gcloud-admin container setup completed (just gcloud-admin::setup-nonprod)
  • ArgoCD authentication (just gcloud-admin::auth-argocd-login)

Key Concepts:

  • ArgoCD app name: You MUST list apps first to find the correct name — do NOT guess app names
  • Kubernetes namespace: Format is tusdi-<env> (e.g., tusdi-preview-92, tusdi-staging)
  • App names and namespace names are DIFFERENT and not always predictable

Check ArgoCD Authentication Status:

# Verify you're logged in
just gcloud-admin::auth-argocd-status

# If not logged in, authenticate (interactive SSO - requires browser)
# Linux:
just gcloud-admin::auth-argocd-login
# macOS (--network host doesn't work, use import instead):
# 1. argocd login <server> --sso --grpc-web  (on host)
# 2. just gcloud-admin::auth-argocd-import

List ArgoCD Applications (ALWAYS do this first):

# IMPORTANT: Always list apps first to find the correct app name
# Do NOT guess app names - they may not follow predictable patterns
just gcloud-admin::argocd app list

# Filter preview apps
just gcloud-admin::argocd app list | grep preview

Check Application Status:

# Get app summary (use the EXACT app name from 'app list' output above)
just gcloud-admin::argocd app get <app-name-from-list>

# Get status as JSON for scripting
just gcloud-admin::argocd app get <app-name-from-list> --output json | jq '{
  sync_status: .status.sync.status,
  health_status: .status.health.status,
  operation_state: .status.operationState.phase
}'

Sync Application (Trigger Deployment):

# Basic sync - applies manifests from git
just gcloud-admin::argocd app sync <app-name-from-list>

# Force sync - override any drift
just gcloud-admin::argocd app sync <app-name-from-list> --force

# Sync with prune - remove resources not in git
just gcloud-admin::argocd app sync <app-name-from-list> --prune

Handle Stuck Operations:

If sync fails with "another operation is already in progress":

# Check current operation state
just gcloud-admin::argocd app get <app-name-from-list> --output json | jq '.status.operationState'

# Terminate stuck operation
just gcloud-admin::argocd app terminate-op <app-name-from-list>

# Then retry sync
just gcloud-admin::argocd app sync <app-name-from-list>

Monitor Deployment Progress:

# Watch pods in the namespace
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -w

# Check specific deployment
just gcloud-admin::kubectl get pods -n tusdi-preview-92 | grep tusdi-api

# Get pod logs
just gcloud-admin::kubectl logs -n tusdi-preview-92 <pod-name> --tail=100

# Get recent events
just gcloud-admin::kubectl get events -n tusdi-preview-92 --sort-by='.lastTimestamp' | tail -20

Troubleshooting Unhealthy Deployments:

# Check deployment status
just gcloud-admin::kubectl describe deployment tusdi-api -n tusdi-preview-92

# Check pod status and restart count
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -o wide

# Check pod events for errors
just gcloud-admin::kubectl describe pod <pod-name> -n tusdi-preview-92

# Check pod logs for startup errors (like ImportError)
just gcloud-admin::kubectl logs -n tusdi-preview-92 <pod-name> --tail=200

Force Redeploy (Delete and Let ArgoCD Recreate):

If a deployment is corrupted or stuck:

# Delete the deployment (ArgoCD will recreate it)
just gcloud-admin::kubectl delete deployment tusdi-api -n tusdi-preview-92

# Sync to trigger recreation
just gcloud-admin::argocd app sync <app-name-from-list>

# Monitor recreation
just gcloud-admin::kubectl get pods -n tusdi-preview-92 -w

Common Scenarios:

ProblemSolution
ImportError on startupPush code fix, ArgoCD auto-syncs, or manual sync
Deployment deleted accidentallyargocd app sync recreates from manifests
Sync stuck "in progress"argocd app terminate-op then retry
Pod CrashLoopBackOffCheck logs, fix code/config, push, sync
Image pull errorVerify image exists in registry, check secrets

Workflow: Fix Code Error on Preview:

  1. Identify error (e.g., ImportError in pod logs)
  2. Fix locally and commit the fix
  3. Push to branch that the preview tracks
  4. Wait for CI/CD to build new image (GitHub Actions)
  5. ArgoCD auto-syncs or manually trigger: just gcloud-admin::argocd app sync <app-name-from-list>
  6. Monitor pod until healthy: just gcloud-admin::kubectl get pods -n tusdi-preview-92 -w
  7. Verify fix by checking pod logs: just gcloud-admin::kubectl logs -n tusdi-preview-92 <pod-name> --tail=50

Volume Management

Named Volumes

VolumeServicePurpose
postgres_dataPostgreSQLUser data, sessions
neo4j_dataNeo4jGraph data
neo4j_logsNeo4jDatabase logs
redis_dataRedisCache persistence
qdrant_storageQdrantVector embeddings
redisinsight_dataRedisInsightUI settings

Qdrant Snapshot Restore

The dev_stack.py script automatically restores Qdrant snapshots on first startup:

# Manual restore if needed
(cd repos/medical-catalog && just snapshot-restore-all)

# Check Qdrant collections
curl http://localhost:6333/collections

Clearing Volumes

# Stop services and remove volumes
docker compose down -v

# Remove specific volume
docker volume rm machina-meta_postgres_data

# Remove all unused volumes (DANGEROUS)
docker volume prune -f

See references/volumes.md for detailed volume documentation.


Health Checks

dev_stack.py Orchestration

The scripts/dev_stack.py script provides intelligent health monitoring:

  • 90-second timeout for service startup
  • HTTP endpoint validation for web services
  • Container health status from Docker
  • Automatic error analysis from logs
  • Qdrant snapshot detection and restore

Service Health Endpoints

ServiceHealth Endpoint
BackendGET /api/v1/health
Medical CatalogGET /health
QdrantGET /healthz
Neo4jGET :7474 (browser)
PostgreSQLpg_isready
Redisredis-cli ping

Sanity Check Suite

Run the comprehensive sanity check before committing Docker changes:

just dev-check

This runs 6 non-destructive verification tests:

  1. Service Status - ./scripts/dev_stack.py status
  2. Container Status - docker compose ps
  3. Health Checks - Container health inspection
  4. Resource Usage - docker stats snapshot
  5. Volume Status - List machina volumes
  6. Endpoint Health - HTTP checks for backend, catalog, and Qdrant

Manual Health Checks

# Check all container status
docker compose ps

# Check specific service logs
docker compose logs backend --tail 50

# Check container health
docker inspect --format='{{json .State.Health}}' machina-meta-backend-1

Troubleshooting

Port Already in Use

# Find what's using the port
lsof -i :8000
# or
ss -tulpn | grep 8000

# Kill process or stop container
docker ps | grep 8000
docker stop <container-name>

Container Won't Start

# Check logs
docker compose logs <service> --tail 100

# Check health status
docker inspect --format='{{json .State.Health}}' <container-name>

# Check events
docker events --since 5m

Database Connection Failed

# Verify containers are healthy
just dev-status

# Check network connectivity
docker network ls
docker network inspect machina-meta_default

# Test connection from host
pg_isready -h localhost -p 5432
redis-cli -h localhost -p 6379 ping

Neo4j Won't Start

# Check Neo4j logs
docker compose logs neo4j --tail 100

# Common issue: memory limits
# Edit docker-compose.yaml NEO4J_dbms_memory_* settings

Qdrant Snapshots Not Restored

# Check if volume exists
docker volume ls | grep qdrant

# Manual restore
(cd repos/medical-catalog && just snapshot-restore-all)

# Verify collections
curl http://localhost:6333/collections

Log Extraction

# Extract backend logs
(cd repos/dem2 && ./scripts/extract_backend_logs.sh -s backend -l ERROR --since 10m)

# Follow all logs
docker compose logs -f

# Follow specific service
docker compose logs -f backend

See references/troubleshooting.md for comprehensive troubleshooting guide.


API Testing with curl_api

The curl_api justfile rule in repos/dem2 provides a convenient JSON-based interface for testing backend APIs without writing code.

Overview

Location: repos/dem2/justfile (rule: curl_api) Backend Script: repos/dem2/scripts/curl_api.sh Purpose: Call backend API functions using JSON dispatch for development and testing

How It Works

The curl_api rule uses a JSON dispatch system that:

  1. Accepts a JSON payload with a function field and arguments
  2. Routes the call to a registered bash function in curl_api.sh
  3. Handles authentication automatically (JWT tokens + patient context)
  4. Executes the API call and returns the result

Basic Usage

# From repos/dem2 directory
(cd repos/dem2 && just curl_api '{"function": "function_name", "arg1": "value1", ...}')

Authentication:

  • Automatically handles JWT token generation via user_manager.py
  • Sets patient context header (X-Patient-Context-ID)
  • Default user: dbeal@numberone.ai
  • Default patient: Stuart McClure, DOB: 1969-03-03

Available Function Categories

Document Management

List documents:

(cd repos/dem2 && just curl_api '{"function": "list_documents"}')

Upload a file:

(cd repos/dem2 && just curl_api '{"function": "upload_file", "path": "datasets/documents/test.pdf"}')

Process a specific document:

(cd repos/dem2 && just curl_api '{"function": "process_document", "file_id": "uuid-here"}')

Process all uploaded documents:

(cd repos/dem2 && just curl_api '{"function": "process_all_documents"}')

Delete all documents:

(cd repos/dem2 && just curl_api '{"function": "delete_all_documents"}')

Task Management

List all document processing tasks:

(cd repos/dem2 && just curl_api '{"function": "list_tasks"}')

Get specific task details:

(cd repos/dem2 && just curl_api '{"function": "get_task", "task_id": "uuid-here"}')

List failed tasks only:

(cd repos/dem2 && just curl_api '{"function": "list_failed_tasks"}')

Patient Management

List all patients:

(cd repos/dem2 && just curl_api '{"function": "list_patients"}')

Agent Session Management

Create a new agent session:

(cd repos/dem2 && just curl_api '{"function": "create_session", "name": "My Session"}')

Set/update session name:

(cd repos/dem2 && just curl_api '{"function": "set_session_name", "session_id": "uuid-here", "name": "New Name"}')

List all sessions:

(cd repos/dem2 && just curl_api '{"function": "list_sessions"}')

Agent Query

Query the medical agent:

(cd repos/dem2 && just curl_api '{"function": "query_agent", "query": "What is my cholesterol?"}')

Query with specific session:

(cd repos/dem2 && just curl_api '{"function": "query_agent", "query": "What is my cholesterol?", "session_id": "uuid-here"}')

Agent Diagnostics

Check agent dependencies:

(cd repos/dem2 && just curl_api '{"function": "check_agent_dependencies"}')

Validate agent configuration:

(cd repos/dem2 && just curl_api '{"function": "validate_agent_config"}')

Medical Catalog (Biomarker Enrichment)

Enrich biomarkers:

(cd repos/dem2 && just curl_api '{"function": "catalog_enrich", "names": ["ApoA-1", "Factor II"]}')

Check enrichment status:

(cd repos/dem2 && just curl_api '{"function": "catalog_enrich_status", "task_id": "uuid-here"}')

Search for biomarkers:

(cd repos/dem2 && just curl_api '{"function": "catalog_search", "names": ["cholesterol"], "limit": 5}')

Search by alias groups:

(cd repos/dem2 && just curl_api '{"function": "catalog_search_by_alias", "alias_groups": [["LDL"], ["HDL", "HDL-C"]]}')

Search derivative biomarkers (ratios, calculated values):

(cd repos/dem2 && just curl_api '{"function": "catalog_search_derivatives", "names": ["ApoB/ApoA-1"]}')

Enrich derivatives (ratios, sums, percentages):

(cd repos/dem2 && just curl_api '{"function": "catalog_enrich_derivatives", "names": ["ApoB/ApoA-1", "TC/HDL-C"]}')

List all derivatives (with pagination):

(cd repos/dem2 && just curl_api '{"function": "catalog_list_derivatives", "limit": 100, "offset": 0}')

Debug

Debug JSON argument structure:

(cd repos/dem2 && just curl_api '{"function": "debug_args", "names": ["test1", "test2"]}')

Common Workflows

List Available Test Documents

Before uploading documents for testing, list available test documents in the repository:

# List all test documents with full paths
just list-test-docs

# Output example:
# [
#   "pdf_tests/medical_records/.../Boston Heart July 2021.pdf",
#   "pdf_tests/medical_records/.../Dutch cortisol 9-01-25.pdf",
#   ...
# ]

Use this to:

  • Find available test documents before testing document processing
  • Get correct paths for upload functions
  • Identify specific documents for debugging (e.g., Dutch cortisol document for "Estrone (E1)" testing)

Upload and Process a Document

# First, list available test documents
just list-test-docs

# Upload document using path from list-test-docs
(cd repos/dem2 && just curl_api '{"function": "upload_file", "path": "pdf_tests/medical_records/Stuart Mcclure Medical Records (PRIVATE)/Dutch cortisol 9-01-25.pdf"}')

# Process the uploaded document (use the file_id from upload response)
(cd repos/dem2 && just curl_api '{"function": "process_document", "file_id": "file-id-from-upload"}')

Query Agent About Health Markers

# Query agent
(cd repos/dem2 && just curl_api '{"function": "query_agent", "query": "What is my latest cholesterol level?"}')

Check Task Processing Status

# List all tasks to find IDs
(cd repos/dem2 && just curl_api '{"function": "list_tasks"}')

# Get specific task details
(cd repos/dem2 && just curl_api '{"function": "get_task", "task_id": "abc-123-def"}')

Enrich and Validate Biomarkers

# Enrich biomarkers in catalog
(cd repos/dem2 && just curl_api '{"function": "catalog_enrich", "names": ["Total Cholesterol", "LDL", "HDL"]}')

# Search to verify they exist
(cd repos/dem2 && just curl_api '{"function": "catalog_search", "names": ["Total Cholesterol"], "limit": 5}')

How It Differs from Direct curl_api.sh Usage

Using just curl_api (recommended):

(cd repos/dem2 && just curl_api '{"function": "list_documents"}')

Direct script usage (lower-level):

(cd repos/dem2 && bash -c 'source scripts/curl_api.sh && dispatch "{\"function\": \"list_documents\"}"')

Benefits of just curl_api:

  • ✅ Cleaner syntax (no need to source or call dispatch)
  • ✅ Proper error handling via justfile
  • ✅ Consistent working directory handling
  • ✅ Part of documented justfile interface

Environment Variables

Default settings (defined in scripts/curl_api.sh):

PATIENT_FIRST_NAME=Stuart
PATIENT_LAST_NAME=McClure
PATIENT_DATE_OF_BIRTH=1969-03-03
AUTH_EMAIL=dbeal@numberone.ai
BACKEND_URL=http://localhost:8000/api/v1
FRONTEND_URL=http://localhost:3000

Override patient context:

PATIENT_FIRST_NAME=John PATIENT_LAST_NAME=Doe PATIENT_DATE_OF_BIRTH=1990-01-01 \
  (cd repos/dem2 && just curl_api '{"function": "list_documents"}')

Enable verbose curl output (for debugging):

CURL_VERBOSE=1 (cd repos/dem2 && just curl_api '{"function": "list_documents"}')

Error Handling

If a function doesn't exist:

$ (cd repos/dem2 && just curl_api '{"function": "nonexistent"}')
# ERROR: Unknown function: nonexistent
# Available functions: list_documents, upload_file, process_document, ...

If required arguments are missing:

$ (cd repos/dem2 && just curl_api '{"function": "upload_file"}')
# ERROR: Missing 'path' field in JSON
# Usage: {"function": "upload_file", "path": "path/to/file.pdf"}

When to Use curl_api

Use curl_api for:

  • ✅ Quick API testing during development
  • ✅ One-off administrative tasks (upload, delete, etc.)
  • ✅ Debugging API endpoints and responses
  • ✅ Validating authentication and patient context
  • ✅ Scripting batch operations

Don't use curl_api for:

  • ❌ Production operations (use proper API clients)
  • ❌ Performance testing (use dedicated load testing tools)
  • ❌ Automated testing (use pytest with proper fixtures)

Related Commands

Low-level curl_api.sh functions (not dispatched, but useful):

# Get patient ID and set context
(cd repos/dem2 && bash -c 'source scripts/curl_api.sh && _export_patient_context_id_internal && declare -p X_PATIENT_CONTEXT_ID')

# Call backend API with auth
(cd repos/dem2 && bash -c 'source scripts/curl_api.sh && _export_patient_context_id_internal && auth_backend "/graph-memory/medical/observations/grouped"')

See also:

  • repos/dem2/scripts/curl_api.sh - Complete function implementations
  • repos/dem2/justfile (line 271-347) - curl_api rule documentation
  • .claude/skills/machina-ui/SKILL.md - UI debugging with curl_api examples

Environment Variables

Authoritative Configuration

The single source of truth for environment variables is machina-meta/.env at the workspace root.

machina-meta/.env          # THE authoritative .env file

Important: Any .env files in subdirectories (repos/dem2/.env, repos/medical-catalog/.env, etc.) are temporary symlinks to the root .env file. Do not treat them as separate configuration files.

Environment File Hierarchy

FilePurposeScope
repos/dem2/.env.exampleTemplate for all backend env vars (committed to git)Reference
machina-meta/.envActive local config (never committed)Local dev
.env.localhostBasic local reference (~29 exports)Local dev
.env.localhost.alenDev reference with Langfuse/MCP (~72 exports)Local dev
.env.localhost.alen-dev-1Most complete reference (~108 exports)Local dev
.env.<namespace>.<deployment>Generated by import_k8s_environment.py from K8sRemote debug
.current_envSymlink to active .env.<namespace>.<deployment>Remote debug

When adding new environment variables:

  1. Add to repos/dem2/.env.example with documentation comments (blank secrets)
  2. Add to your active .env file with real values
  3. Integrate into k8s manifests (see Kubernetes Environment Configuration below)
  4. If sensitive, add to Google Secret Manager + ExternalSecrets

Database Defaults (Docker Compose)

# PostgreSQL
POSTGRES_USER: postgres
POSTGRES_PASSWORD: demodemo
POSTGRES_DB: demodemo

# Neo4j
NEO4J_AUTH: neo4j/demodemo

# Redis
# No auth by default in dev

Kubernetes Environment Configuration

Architecture: Kustomize Base + Overlays

The k8s environment configuration uses Kustomize with a base + overlay pattern:

repos/dem2-infra/k8s/
├── base/                              # Shared across all environments
│   ├── app.yaml                       # Deployments (tusdi-api, tusdi-webui) + Services
│   ├── config.yaml                    # ConfigMaps (app-config, postgres-config)
│   ├── external-secrets.yaml          # ExternalSecrets (GSM → k8s Secrets)
│   └── kustomization.yaml            # Base resource list
└── overlays/
    ├── dev/
    │   ├── env-vars-patch.yaml        # ENV_FOR_DYNACONF=development, LANGFUSE_PROMPTS_LABEL=development
    │   └── kustomization.yaml         # namespace: tusdi-dev
    ├── staging/
    │   ├── env-vars-patch.yaml        # ENV_FOR_DYNACONF=staging, LANGFUSE_PROMPTS_LABEL=staging
    │   └── kustomization.yaml         # namespace: tusdi-staging
    └── preview/
        ├── env-vars-patch.yaml        # ENVIRONMENT=preview, auth URLs, LANGFUSE_PROMPTS_LABEL=preview-${PR_NUMBER}
        └── kustomization.yaml         # namespace: tusdi-preview-$PR_NUMBER

How Environment Variables Are Organized

Three mechanisms for injecting env vars into pods:

MechanismFileUse ForExample
Direct in Deploymentbase/app.yamlNon-sensitive config shared across envsDYNACONF_REDIS_DB__HOST, TZ
ExternalSecretsbase/external-secrets.yamlSensitive values from Google Secret ManagerAPI keys, passwords
Overlay Patchesoverlays/<env>/env-vars-patch.yamlPer-environment overridesENV_FOR_DYNACONF, LANGFUSE_PROMPTS_LABEL

Complete tusdi-api Environment Variables

Direct values (base/app.yaml):

VariableValueCategory
FORWARDED_ALLOW_IPS*Network
ENVIRONMENT${ENVIRONMENT}App
DYNACONF_PG_DB__HOSTpostgresPostgreSQL
DYNACONF_PG_DB__PORT5432PostgreSQL
DYNACONF_PG_DB__NAMEtusdi_${ENVIRONMENT}PostgreSQL
DYNACONF_PG_DB__USERtusdiPostgreSQL
DYNACONF_NEO4J_DB__HOSTneo4jNeo4j
DYNACONF_NEO4J_DB__PORT7687Neo4j
DYNACONF_NEO4J_DB__USERneo4jNeo4j
DYNACONF_NEO4J_DB__NAMEneo4jNeo4j
NEO4J_URIbolt://neo4j:7687Neo4j
NEO4J_USERneo4jNeo4j
DYNACONF_REDIS_DB__HOSTredis://redis:6379Redis
DYNACONF_AUTH__FRONTEND_URLhttps://${ENVIRONMENT}.${DNS_DOMAIN}Auth
DYNACONF_AUTH__REDIRECT_URLhttps://${ENVIRONMENT}.${DNS_DOMAIN}/api/v1/auth/google/callbackAuth
DYNACONF_AUTH__JWT_SECRET_KEYxBkVoNf...Auth
IMPERSONATE_SAtusdi-nonprod-app@...GCP
DYNACONF_TRACING__ENABLEDtrueTracing
DYNACONF_TRACING__OTEL_EXPORTER_OTLP_ENDPOINThttp://otel-collector.langfuse:4318Tracing
DYNACONF_LANGFUSE__ENABLEDtrueLangfuse
DYNACONF_LANGFUSE__TRACING__ENABLEDtrueLangfuse
DYNACONF_LANGFUSE__BASE__HOSThttp://langfuse-web.langfuse:3000Langfuse
LANGFUSE_HOSThttp://langfuse-web.langfuse:3000Langfuse SDK
DYNACONF_LANGFUSE__HOSThttp://langfuse-web.langfuse:3000Langfuse (dynaconf)
LANGFUSE_PROMPTS_SOURCEhybridPrompt Mgmt
LANGFUSE_PROMPTS_LABELproductionPrompt Mgmt
LANGFUSE_PROMPTS_FALLBACKtruePrompt Mgmt
TZUTCMisc
ENV_FOR_DYNACONFproductionApp
OTEL_PYTHON_URLLIB3_EXCLUDED_URLS.*sentry\\.io.*Tracing

From ExternalSecrets (Google Secret Manager):

VariableK8s SecretGSM Key PatternCategory
DYNACONF_PG_DB__PASSWORDpostgres-secretstusdi-${ENVIRONMENT}-postgres-passwordPostgreSQL
DYNACONF_NEO4J_DB__PASSWORDneo4j-secretstusdi-${ENVIRONMENT}-neo4j-passwordNeo4j
NEO4J_PASSWORDneo4j-secretstusdi-${ENVIRONMENT}-neo4j-passwordNeo4j
OPENAI_API_KEYapi-secretstusdi-${ENVIRONMENT}-openai-keyLLM
GEMINI_API_KEYapi-secretstusdi-${ENVIRONMENT}-gemini-keyLLM
SERPER_API_KEYapi-secretstusdi-${ENVIRONMENT}-serper-api-keySearch
VISION_AGENT_API_KEYapi-secretstusdi-${ENVIRONMENT}-vision-agent-api-keyLLM
GOOGLE_SEARCH_API_KEYapi-secretstusdi-${ENVIRONMENT}-google-search-api-keySearch
DYNACONF_GOOGLE_AUTH__CLIENT_IDbackend-google-oauth-secrets(managed separately)Auth
DYNACONF_GOOGLE_AUTH__CLIENT_SECRETbackend-google-oauth-secrets(managed separately)Auth
DYNACONF_CALENDAR__OAUTH_CLIENT_IDbackend-google-oauth-secrets(managed separately)Auth
DYNACONF_CALENDAR__OAUTH_CLIENT_SECRETbackend-google-oauth-secrets(managed separately)Auth
LANGFUSE_PUBLIC_KEYlangfuse-secretslangfuse-public-key (global)Langfuse
LANGFUSE_SECRET_KEYlangfuse-secretslangfuse-secret-key (global)Langfuse
DYNACONF_LANGFUSE__PUBLIC_KEYlangfuse-secretslangfuse-public-key (global)Langfuse
DYNACONF_LANGFUSE__SECRET_KEYlangfuse-secretslangfuse-secret-key (global)Langfuse

Per-environment overrides (overlay patches):

VariableDevStagingPreview
ENV_FOR_DYNACONFdevelopmentstaging(base: production)
LANGFUSE_PROMPTS_LABELdevelopmentstagingpreview-${PR_NUMBER}
ENVIRONMENT(base)(base)preview
DYNACONF_AUTH__FRONTEND_URL(base)(base)https://preview-${PR_NUMBER}.${DNS_DOMAIN}
DYNACONF_AUTH__REDIRECT_URL(base)(base)https://oauth-callback.${DNS_DOMAIN}/callback
OAUTH_PR_NUMBER--pr-${PR_NUMBER}

Langfuse Variable Naming

The backend reads Langfuse configuration through two parallel paths:

  1. Langfuse SDK env vars (LANGFUSE_*) - Read directly by the Langfuse Python SDK and CLI tools
  2. Dynaconf env vars (DYNACONF_LANGFUSE__*) - Read by dynaconf into config.toml structure, then propagated to SDK by LangfuseIntegration

Both must be set in k8s for full compatibility:

  • LANGFUSE_HOST + DYNACONF_LANGFUSE__HOST + DYNACONF_LANGFUSE__BASE__HOST
  • LANGFUSE_PUBLIC_KEY + DYNACONF_LANGFUSE__PUBLIC_KEY
  • LANGFUSE_SECRET_KEY + DYNACONF_LANGFUSE__SECRET_KEY

The DYNACONF_LANGFUSE__BASE__HOST is the legacy path (pre-refactor); DYNACONF_LANGFUSE__HOST is the newer config path. Both are set to the same value for compatibility.

Hybrid Prompt Management per Environment

EnvironmentLANGFUSE_PROMPTS_SOURCELANGFUSE_PROMPTS_LABELBehavior
Base (production)hybridproductionTry Langfuse production prompts, fallback to local
DevhybriddevelopmentTry Langfuse development prompts, fallback to local
StaginghybridstagingTry Langfuse staging prompts, fallback to local
Previewhybridpreview-${PR_NUMBER}Try Langfuse PR-specific prompts, fallback to local

LANGFUSE_PROMPTS_FALLBACK=true ensures all environments gracefully degrade to local config.yml files if Langfuse is unreachable.

How to Add New Environment Variables

Non-Sensitive Variables (configs, feature flags)

  1. Add to .env.example in repos/dem2/.env.example with comments
  2. Add to base app.yaml in repos/dem2-infra/k8s/base/app.yaml:
    - name: MY_NEW_CONFIG
      value: "default-value"
    
  3. If environment-specific, add overrides in overlays/<env>/env-vars-patch.yaml

Sensitive Variables (API keys, passwords, tokens)

  1. Add to .env.example with blank value

  2. Create secret in Google Secret Manager following naming convention:

    tusdi-${ENVIRONMENT}-<descriptive-key-name>
    

    e.g., tusdi-dev-openai-key, tusdi-staging-openai-key Note: Some secrets are global (e.g., langfuse-public-key) — not all follow the per-env pattern

  3. Add ExternalSecret entry in repos/dem2-infra/k8s/base/external-secrets.yaml:

    ---
    apiVersion: external-secrets.io/v1
    kind: ExternalSecret
    metadata:
      name: my-new-secrets
    spec:
      refreshInterval: 1m
      secretStoreRef:
        name: gcpsm-secret-store
        kind: SecretStore
      target:
        name: my-new-secrets
        creationPolicy: Owner
      data:
      - secretKey: MY_SECRET_KEY
        remoteRef:
          key: tusdi-${ENVIRONMENT}-my-secret-key
    
  4. Reference in app.yaml:

    - name: MY_SECRET_KEY
      valueFrom:
        secretKeyRef:
          name: my-new-secrets
          key: MY_SECRET_KEY
    
  5. Update supporting files (noted in external-secrets.yaml header):

    • terraform/secrets.tf - Add the secret resource
    • deploy-k8s.sh - Add secret to validate_secrets() function

Checklist: .env.example → k8s Integration

When .env.example changes, follow this workflow:

□ Identify new variables from git diff on .env.example
□ Classify each as sensitive (secret) or non-sensitive (config)
□ For secrets: create GSM entries + ExternalSecret + secretKeyRef in app.yaml
□ For configs: add directly to app.yaml with appropriate default
□ Determine if variable needs per-environment overrides → overlay patches
□ Verify variable naming matches what the application code reads
□ Check for dual naming (e.g., LANGFUSE_* + DYNACONF_LANGFUSE__*)
□ Update terraform/secrets.tf and deploy-k8s.sh if adding new secrets

ExternalSecret Groups

K8s Secret NameGSM Key PrefixVariables
postgres-secretstusdi-${ENV}-postgres-*POSTGRES_PASSWORD
neo4j-secretstusdi-${ENV}-neo4j-*NEO4J_PASSWORD
api-secretstusdi-${ENV}-*-keyOPENAI_API_KEY, GEMINI_API_KEY, SERPER_API_KEY, VISION_AGENT_API_KEY, GOOGLE_SEARCH_API_KEY
backend-google-oauth-secrets(managed separately)client-id, client-secret
langfuse-secretslangfuse-* (global, not per-env)LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY

Best Practices

Starting Development

  1. Always use workspace-level just commands from machina-meta root
  2. Use just dev-status to verify services before running code
  3. Check for port conflicts before starting: lsof -i :8000

Stopping Development

  1. Use just dev-down to stop all services cleanly
  2. Don't use docker kill - it doesn't run shutdown hooks
  3. Check nothing is left running: docker ps

Resetting Data

  1. Stop with just dev-down
  2. Remove volumes: docker volume prune -f
  3. Start fresh: just dev-up

Updating Images

# Pull latest images
docker compose pull

# Rebuild local images
docker compose build --no-cache

# Update and restart
just dev-restart

Integration with Other Skills

machina-git

Use machina-git for all git operations. Docker skill handles containers only.

kubernetes

Kubernetes skill handles production cluster operations. Docker images built here are deployed via Kubernetes.

machina-ui

For frontend debugging, use machina-ui skill which can interact with containerized services.


Pull Request Creation

Overview

This section provides the authoritative process for creating pull requests in the dem2, dem2-webui, and dem2-infra repositories.

Note: This section is in machina-docker (not machina-git) because PR creation uses the gh CLI for GitHub operations, while machina-git handles core git operations (commit, push, status). For git commands below, use the machina-git skill workflow.

PR Creation Process

Step 1: Identify merge base and examine ALL commits

# Find merge base (usually origin/dev for feature branches)
cd repos/dem2 && git fetch origin && git merge-base origin/dev HEAD

# Get comprehensive branch stats
BASE=$(git merge-base origin/dev HEAD)
echo "=== Branch Stats ==="
echo "Commits: $(git rev-list --count $BASE..HEAD)"
echo "Files: $(git diff --stat $BASE..HEAD | tail -1)"
echo "Date range: $(git log --format='%as' $BASE..HEAD | tail -1) to $(git log --format='%as' -1 HEAD)"

# List ALL commits from merge base to HEAD
git log --oneline $BASE..HEAD

# Get the FULL diff (CRITICAL - examine ALL changes, not just commit messages)
git diff $BASE..HEAD

Step 2: Analyze changes by category

Group all changes into categories:

  • Bug Fixes: Error corrections, crash fixes
  • Features: New functionality
  • Refactoring: Code improvements without behavior change
  • Performance: Speed/resource optimizations
  • Documentation: Docs, comments, README updates
  • Testing: Test additions/modifications
  • Infrastructure: CI/CD, build, deployment changes

Step 3: Generate PR using gh CLI

cd repos/dem2 && gh pr create --base dev --head <branch-name> --title "<title>" --body "<body>"

PR Summary Template

Use this exact structure for PR summaries:

## Summary

<One paragraph overview describing the primary purpose and key achievements of this PR.>

### Key Changes

| Category | Description |
|----------|-------------|
| **<Category 1>** | <Brief description of what changed> |
| **<Category 2>** | <Brief description of what changed> |

---

### Key Changes

#### <Feature/Fix Area 1>
- ✅ <Completed item with specific detail>
- ✅ <Completed item with specific detail>

#### <Feature/Fix Area 2>
- ✅ <Completed item with specific detail>

### Technical Details

<If applicable, include tables for:>
- Performance improvements (Before/After/Impact)
- Architectural decisions (Decision/Rationale/Outcome)
- Breaking changes (if any)

## Test Plan

- [x] <Specific test performed>
- [x] <Specific test performed>
- [x] <Verification method used>

## Related Issues

- Fixes #<issue> (if applicable)
- Related to #<issue> (if applicable)

PR Summary Generation Prompt

When generating a PR summary, follow these rules:

  1. Read ALL diffs: Examine every file changed from merge base to HEAD
  2. Categorize changes: Group by feature area, not by file
  3. Be specific: Include actual function names, file paths, metrics
  4. Use tables: For comparisons, metrics, architectural decisions
  5. Include evidence: Reference specific commits, test results, measurements
  6. Quantify impact: "94% cost reduction" not "significant savings"
  7. Test plan must be real: Only include tests actually performed
  8. No fluff: Every line should convey information

DO NOT:

  • Include generic statements like "improved code quality"
  • List files without explaining what changed
  • Include untested items in the test plan
  • Add performance claims without measurements
  • Use vague language like "various improvements"

DO:

  • Start with the most important/impactful changes
  • Group related changes together
  • Include specific function/class names
  • Reference commit hashes for key changes
  • Explain WHY architectural decisions were made

Example Workflow

# 1. Get merge base and stats
cd repos/dem2 && git fetch origin
BASE=$(git merge-base origin/dev HEAD)
echo "Commits: $(git rev-list --count $BASE..HEAD)"
echo "Files: $(git diff --stat $BASE..HEAD | tail -1)"
echo "Date range: $(git log --format='%as' $BASE..HEAD | tail -1) to $(git log --format='%as' -1 HEAD)"

# 2. List all commits (READ each one for context)
git log --oneline $BASE..HEAD

# 3. Get full diff (CRITICAL - analyze the actual code changes)
git diff $BASE..HEAD

# 4. Create PR with summary
gh pr create --base dev --title "feat: <concise title>" --body "$(cat <<'EOF'
## Summary
<summary based on diff analysis>
...
EOF
)"

Creating PRs for Different Repos

dem2 (Backend):

cd repos/dem2 && gh pr create --base dev --head <branch>

dem2-webui (Frontend):

cd repos/dem2-webui && gh pr create --base dev --head <branch>

dem2-infra (Infrastructure):

cd repos/dem2-infra && gh pr create --base main --head <branch>

Reference Files

This skill includes detailed documentation in references/:

  • compose-files.md - Complete documentation of all 13 docker-compose files
  • profiles.md - Profile system (dev, test, prod) detailed guide
  • ports.md - All port mappings with conflict detection
  • volumes.md - Volume management, backup, restore
  • troubleshooting.md - Common issues and solutions
  • scripts.md - Helper script documentation (dev_stack.py, etc.)

Use Read to access specific reference files when detailed information is needed.

Additional references:

Skills Info
Original Name:machina-dockerAuthor:numberone