Agent Skill
2/7/2026

aiml-operations

This skill should be used when the user asks to "deploy ML model", "MLOps", "model serving", "feature store", "experiment tracking", "model registry", "ML pipeline", "model monitoring", "GPU inference", "TensorFlow Serving", "MLflow", or needs help with machine learning operations and model deployment.

Q
qenex
0GitHub Stars
1Views
npx skills add qenex-ai/devops-plugin

SKILL.md

Nameaiml-operations
DescriptionThis skill should be used when the user asks to "deploy ML model", "MLOps", "model serving", "feature store", "experiment tracking", "model registry", "ML pipeline", "model monitoring", "GPU inference", "TensorFlow Serving", "MLflow", or needs help with machine learning operations and model deployment.

name: AI/ML Operations description: This skill should be used when the user asks to "deploy ML model", "MLOps", "model serving", "feature store", "experiment tracking", "model registry", "ML pipeline", "model monitoring", "GPU inference", "TensorFlow Serving", "MLflow", or needs help with machine learning operations and model deployment. version: 1.0.0

AI/ML Operations

Comprehensive guidance for MLOps, model deployment, and ML infrastructure.

ML Pipeline Architecture

Data → Feature Engineering → Training → Evaluation → Registry → Serving → Monitoring
                                          ↑                            ↓
                                          └────── Retraining ──────────┘

Model Serving

TensorFlow Serving

docker run -p 8501:8501 \
  -v /models/mymodel:/models/mymodel \
  -e MODEL_NAME=mymodel \
  tensorflow/serving

# Inference
curl -X POST http://localhost:8501/v1/models/mymodel:predict \
  -d '{"instances": [[1.0, 2.0, 3.0]]}'

Triton Inference Server

docker run --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 \
  -v /models:/models \
  nvcr.io/nvidia/tritonserver:23.10-py3 \
  tritonserver --model-repository=/models

MLflow

import mlflow

# Start experiment
mlflow.set_experiment("my-experiment")

with mlflow.start_run():
    # Log parameters
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("epochs", 100)

    # Train model
    model = train_model(...)

    # Log metrics
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_metric("loss", 0.05)

    # Log model
    mlflow.sklearn.log_model(model, "model")

Feature Store

Feast

from feast import FeatureStore

store = FeatureStore(repo_path=".")

# Get features for inference
features = store.get_online_features(
    features=["user_features:age", "user_features:income"],
    entity_rows=[{"user_id": 123}]
).to_dict()

Model Monitoring

  • Data drift detection - Compare input distributions
  • Prediction drift - Monitor output distributions
  • Performance degradation - Track accuracy over time
  • Latency monitoring - Inference time SLOs

GPU Infrastructure

Kubernetes GPU Scheduling

resources:
  limits:
    nvidia.com/gpu: 1
nodeSelector:
  accelerator: nvidia-tesla-v100

Additional Resources

Reference Files

  • references/mlops-tools.md - MLOps tool comparison
  • references/model-serving.md - Model serving patterns
Skills Info
Original Name:aiml-operationsAuthor:qenex