platform-engineer
**Master Skill**: Unified Platform, SRE & Release Engineering. Covers OpenShift 4.20+, GitOps (ArgoCD/Tekton), Container Hardening, Service Mesh, Feature Flags, Progressive Rollouts, Observability (LGTM Stack), Chaos Engineering, and Disaster Recovery.
SKILL.md
| Name | platform-engineer |
| Description | **Master Skill**: Unified Platform, SRE & Release Engineering. Covers OpenShift 4.20+, GitOps (ArgoCD/Tekton), Container Hardening, Service Mesh, Feature Flags, Progressive Rollouts, Observability (LGTM Stack), Chaos Engineering, and Disaster Recovery. |
name: platform-engineer version: 3.0.0 maturity: stable updated: 2026-05-04 author: payu-platform-team requires: [] tags: [devops, k8s, openshift, infrastructure, gitops, argocd, tekton, helm, sre, reliability, releases, feature-flags] related: [cybersecurity-architect, integration-architect, finops-engineer] description: Master Skill: Unified Platform, SRE & Release Engineering. Covers OpenShift 4.20+, GitOps (ArgoCD/Tekton), Container Hardening, Service Mesh, Feature Flags, Progressive Rollouts, Observability (LGTM Stack), Chaos Engineering, and Disaster Recovery.
π Reference Implementation Patterns
For detailed patterns and historical context on PayU infrastructure, see:
PayU Platform Architect Master Skill
You are the Lead Platform Engineer for the PayU Platform. You design and maintain the enterprise-grade automated delivery infrastructure on top of Red Hat OpenShift 4.20+.
β‘ 2026 Platform Engineering Trends
- Internal Developer Portal (IDP): Backstage/Red Hat Developer Hub is the golden path interface.
- eBPF Observability: Using Pixie/Cilium for zero-instrumentation monitoring.
- GreenOps: Carbon-aware scheduling for batch jobs.
- Policy as Code: Kyverno/OPA for strict governance enforcement at the cluster level.
- Container Port Standardization: All 22 microservices MUST listen on internal port 8080 to simplify networking, healthchecks, and service mesh routing.
π GitOps & Continuous Delivery (ArgoCD)
1. ApplicationSet for Multi-Environment
# infrastructure/platform/argocd-gitops/applicationsets/payu-applicationsets.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: payu-environments
namespace: openshift-gitops
spec:
goTemplate: true
goTemplateOptions:
- missingkey=error
generators:
- list:
elements:
- name: dev
path: infrastructure/workloads/overlays/dev
namespace: payu-dev
project: payu-dev
- name: sit
path: infrastructure/workloads/overlays/sit
namespace: payu-sit
project: payu-sit
- name: uat
path: infrastructure/workloads/overlays/uat
namespace: payu-uat
project: payu-uat
- name: preprod
path: infrastructure/workloads/overlays/preprod
namespace: payu-preprod
project: payu-preprod
- name: prod
path: infrastructure/workloads/overlays/prod
namespace: payu
project: payu
template:
metadata:
name: "payu-{{.name}}"
spec:
project: "{{.project}}"
source:
repoURL: https://github.com/fajjarnr/payu.git
targetRevision: main
path: "{{.path}}"
destination:
server: https://kubernetes.default.svc
namespace: "{{.namespace}}"
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- RespectIgnoreDifferences=true
Repo alignment note:
- Stable environments are generated from
infrastructure/workloads/overlays/{dev,sit,uat,preprod,prod}. - Preview environments must override namespace to
payu-dev-pr-*because the dev overlay hardcodespayu-dev.
2. Sync Windows for Production Safety
# infrastructure/platform/argocd-gitops/projects/payu-projects.yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: payu
namespace: openshift-gitops
spec:
sourceRepos:
- https://github.com/fajjarnr/payu.git
destinations:
- namespace: payu
server: https://kubernetes.default.svc
syncWindows:
- kind: allow
schedule: "0 1 * * 1-5"
duration: 8h
applications:
- payu-prod
namespaces:
- payu
- kind: deny
schedule: "0 0 * * 0,6"
duration: 24h
applications:
- payu-prod
namespaces:
- payu
3. Automated Rollback
# infrastructure/platform/argocd-gitops/applicationsets/payu-applicationsets.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
spec:
template:
spec:
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
ignoreDifferences:
- group: apps
kind: Deployment
jqPathExpressions:
- .spec.template.metadata.annotations
π§ Tekton CI/CD Pipelines
1. Modular Pipeline Structure
# infrastructure/platform/tekton-pipelines/build-pipeline.yaml
apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
name: payu-build-pipeline
namespace: payu-cicd
spec:
tasks:
- name: fetch-repository
taskRef:
name: git-clone
- name: secret-scan
runAfter: [fetch-repository]
taskRef:
name: gitleaks
- name: deep-secret-scan
runAfter: [fetch-repository]
taskRef:
name: trufflehog
- name: semgrep-scan
runAfter: [fetch-repository]
taskRef:
name: semgrep
- name: service-sast-sca
runAfter: [semgrep-scan]
taskRef:
name: security-scan
- name: build-image
runAfter: [secret-scan, deep-secret-scan, service-sast-sca]
taskRef:
name: buildah
- name: trivy-image-scan
runAfter: [build-image]
taskRef:
name: trivy
- name: rhacs-policy-check
runAfter: [trivy-image-scan]
taskRef:
name: rhacs-image-check
- name: generate-sbom
runAfter: [rhacs-policy-check]
taskRef:
name: syft-sbom
- name: grype-sbom-check
runAfter: [generate-sbom]
taskRef:
name: grype-scan
- name: sign-image
runAfter: [grype-sbom-check]
taskRef:
name: cosign-sign
Repo alignment note:
- PayU build pipelines in
payu-cicdenforcegitleaks -> trufflehog -> semgrep -> service SAST/SCA -> build -> trivy -> RHACS -> Syft -> Grype -> Cosign. - Deploy pipelines gate environment promotion with Argo sync wait plus ZAP/Litmus in SIT, Schemathesis/k6 in UAT, and Cerberus/Kraken in preprod.
2. Pipeline Trigger for Git Events
# tekton/triggers/github-push-trigger.yaml
apiVersion: triggers.tekton.dev/v1beta1
kind: TriggerTemplate
metadata:
name: java-service-trigger
spec:
params:
- name: gitrevision
- name: gitrepositoryurl
- name: servicename
resourcetemplates:
- apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: "$(tt.params.servicename)-"
spec:
pipelineRef:
name: java-service-pipeline
params:
- name: git-url
value: $(tt.params.gitrepositoryurl)
- name: git-revision
value: $(tt.params.gitrevision)
- name: service-name
value: $(tt.params.servicename)
workspaces:
- name: source
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: triggers.tekton.dev/v1beta1
kind: EventListener
metadata:
name: github-listener
spec:
serviceAccountName: tekton-triggers-sa
triggers:
- name: github-push
interceptors:
- ref:
name: github
params:
- name: secretRef
value:
secretName: github-webhook-secret
secretKey: token
- name: eventTypes
value: ["push"]
bindings:
- ref: github-push-binding
template:
ref: java-service-trigger
ποΈ Container Hardening (Podman/UBI9)
PayU menggunakan Podman secara eksklusif karena arsitekturnya yang daemonless dan kemampuan eksekusi rootless secara native, yang jauh lebih aman dibanding Docker.
1. Production Containerfile Template
# Containerfile (Podman) - Multi-stage build for Java service
# Stage 1: Build
FROM registry.access.redhat.com/ubi9/openjdk-21:1.18 AS builder
WORKDIR /build
COPY pom.xml .
COPY src ./src
RUN mvn clean package -DskipTests -Dmaven.repo.local=/build/.m2
# Stage 2: Runtime (minimal)
FROM registry.access.redhat.com/ubi9/ubi-minimal:9.3
# Security: Create non-root user
RUN microdnf install -y java-21-openjdk-headless shadow-utils && \
microdnf clean all && \
groupadd -r payu -g 1001 && \
useradd -r -g payu -u 1001 -d /app payu
WORKDIR /app
# Copy only the built artifact
COPY --from=builder --chown=payu:payu /build/target/*.jar app.jar
# Security: Run as non-root
USER 1001
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8080/actuator/health/liveness || exit 1
# Security: Drop all capabilities
# Read-only root filesystem
# No new privileges
EXPOSE 8080
ENTRYPOINT ["java", \
"-XX:+UseContainerSupport", \
"-XX:MaxRAMPercentage=75.0", \
"-Djava.security.egd=file:/dev/./urandom", \
"-jar", "app.jar"]
2. Security Context in Kubernetes
# deployment.yaml
spec:
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
seccompProfile:
type: RuntimeDefault
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp
- name: logs
mountPath: /app/logs
volumes:
- name: tmp
emptyDir: {}
- name: logs
emptyDir: {}
3. SELinux Guardrails (Red Hat Best Practices)
Platform PayU mengandalkan SELinux untuk pertahanan Enforced secara default. Jangan pernah mematikan SELinux (setenforce 0) di lingkungan produksi.
Volume Labeling (:Z vs :z)
Saat mounting volume di Podman, label SELinux harus dikelola agar proses kontainer memiliki izin akses.
:Z: Private unshared volume. Mencegah kontainer lain mengakses data ini. (Direkomendasikan).:z: Shared volume. Bisa diakses oleh beberapa kontainer.
# Contoh running rootless podman dengan SELinux labeling
podman run -v /data/db:/var/lib/postgresql/data:Z postgres:16
OpenShift MCS (Multi-Category Security)
Di OpenShift, setiap namespace mendapatkan kategori SELinux yang unik (misal: s0:c12,c34). Ini mencegah kontainer di Namespace A mengakses volume di Namespace B meskipun UUID-nya sama.
Security Context Constraints (SCC)
Gunakan SCC restricted-v2 (default di OCP 4.12+) yang secara otomatis:
- Mengalokasikan UID unik dari range namespace.
- Menerapkan tipe SELinux
container_t. - Memaksa penggunaan
seccompProfiletipeRuntimeDefault.
Troubleshooting Commands
Jika terjadi Permission Denied meskipun permission file di host (Linux) sudah 777:
- Cek audit log:
ausearch -m avc -ts recent - Lihat konteks file:
ls -Z /path/to/data - Perbaiki label:
restorecon -Rv /path/to/data
β Platform Port Standardization
All PayU backend services follow the 8080 Standard for internal container networking. This reduces configuration complexity and aligns with OpenShift/Kubernetes networking patterns.
1. Port Mapping Principles
- Internal Port: Always 8080. All applications (Spring Boot, Quarkus, FastAPI) must listen on this port inside the container.
- External Port: Managed via
docker-composeorpodman-composehost mapping (e.g.,8001:8080). - Service Discovery: Internal communication between containers uses the service name and port 8080 (e.g.,
http://account-service:8080).
2. Implementation Checklist
-
Dockerfile:
EXPOSE 8080. -
Application Config:
server.port=8080(Spring) orquarkus.http.port=8080. -
Health Check: Endpoint must be matched to port 8080 (e.g.,
http://localhost:8080/actuator/health). -
Gateway Routes: All
ROUTES_URLmust point to port 8080 of the target service.
4. OCI & Metadata Standards (Legacy Container Engineer)
Semua container image PayU WAJIB memiliki metadata standar untuk auditability dan traceability, menggunakan standar OCI (Open Container Initiative).
Containerfile Labels (Build Time)
# Standard OCI Labels
LABEL org.opencontainers.image.vendor="PayU Digital Banking" \
org.opencontainers.image.authors="platform@payu.fajjjar.my.id" \
org.opencontainers.image.title="Wallet Service" \
org.opencontainers.image.description="Core ledger and balance management service" \
org.opencontainers.image.licenses="Proprietary" \
org.opencontainers.image.source="https://github.com/payu/wallet-service" \
org.opencontainers.image.documentation="https://docs.payu.internal/services/wallet" \
org.opencontainers.image.version="${VERSION}" \
org.opencontainers.image.created="${BUILD_DATE}" \
org.opencontainers.image.revision="${GIT_COMMIT}"
# PayU Specific Metadata
LABEL id.payu.service.tier="1" \
id.payu.service.domain="transaction" \
id.payu.compliance.pci-dss="true" \
id.payu.security.scan-level="critical"
Kubernetes Annotations (Runtime)
metadata:
annotations:
# Build Info
image.openshift.io/triggers: '[{''from'':{''kind'':''ImageStreamTag'',''name'':''wallet-service:latest''},''fieldPath'':''spec.template.spec.containers[?(@.name=="app")].image''}]'
# Ownership & Contact
start.payu.fajjjar.my.id/owner: "Wallet Team <wallet@payu.fajjjar.my.id>"
start.payu.fajjjar.my.id/slack-channel: "#dev-wallet"
# Operational Metadata
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/actuator/prometheus"
# Documentation
link.argocd.argoproj.io/external-link: "https://docs.payu.internal/services/wallet"
π¦ Helm Chart Standards
1. Chart Structure
helm/
βββ wallet-service/
βββ Chart.yaml
βββ values.yaml
βββ values-dev.yaml
βββ values-sit.yaml
βββ values-prod.yaml
βββ templates/
β βββ _helpers.tpl
β βββ deployment.yaml
β βββ service.yaml
β βββ configmap.yaml
β βββ secret.yaml
β βββ hpa.yaml
β βββ pdb.yaml
β βββ networkpolicy.yaml
β βββ servicemonitor.yaml
β βββ NOTES.txt
βββ tests/
βββ test-connection.yaml
2. Values Schema
# values.yaml
replicaCount: 2
image:
repository: registry.payu.internal/payu/wallet-service
tag: "latest"
pullPolicy: IfNotPresent
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilization: 70
targetMemoryUtilization: 80
podDisruptionBudget:
enabled: true
minAvailable: 1
networkPolicy:
enabled: true
ingress:
- from:
- namespaceSelector:
matchLabels:
name: payu-gateway
ports:
- port: 8080
monitoring:
enabled: true
path: /actuator/prometheus
port: 8080
π Service Mesh (Istio)
1. Traffic Management
# VirtualService for Canary Deployment
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: wallet-service
spec:
hosts:
- wallet-service
http:
- match:
- headers:
x-canary:
exact: "true"
route:
- destination:
host: wallet-service
subset: canary
weight: 100
- route:
- destination:
host: wallet-service
subset: stable
weight: 90
- destination:
host: wallet-service
subset: canary
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: wallet-service
spec:
host: wallet-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
h2UpgradePolicy: UPGRADE
http1MaxPendingRequests: 100
http2MaxRequests: 1000
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
subsets:
- name: stable
labels:
version: stable
- name: canary
labels:
version: canary
2. Mutual TLS (mTLS) Strict Mode
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: payu
spec:
mtls:
mode: STRICT
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: wallet-service-authz
namespace: payu
spec:
selector:
matchLabels:
app: wallet-service
rules:
- from:
- source:
principals:
- cluster.local/ns/payu/sa/gateway-service
- cluster.local/ns/payu/sa/transaction-service
to:
- operation:
methods: ["GET", "POST", "PUT"]
paths: ["/api/*"]
π Multi-Region Disaster Recovery
1. Architecture Pattern
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Global Load Balancer (GSLB) β
β (Cloudflare/AWS Route53) β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββ΄ββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββ βββββββββββββββββββββββ
β Region 1 (Active) β β Region 2 (Standby) β
β Jakarta DC β β Singapore DC β
βββββββββββββββββββββββ€ βββββββββββββββββββββββ€
β OpenShift Cluster β β OpenShift Cluster β
β - All services ββββββββΆβ - All services β
β - Kafka (Primary) β Sync β - Kafka (Mirror) β
β - PostgreSQL (RW) ββββββββΆβ - PostgreSQL (RO) β
β - Redis (Master) ββββββββΆβ - Redis (Replica) β
βββββββββββββββββββββββ βββββββββββββββββββββββ
2. Failover Configuration
# Multi-region Kafka MirrorMaker2
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
name: payu-mm2
spec:
version: 3.6.0
replicas: 3
connectCluster: "region-2"
clusters:
- alias: "region-1"
bootstrapServers: kafka-region1.payu.internal:9092
- alias: "region-2"
bootstrapServers: kafka-region2.payu.internal:9092
mirrors:
- sourceCluster: "region-1"
targetCluster: "region-2"
sourceConnector:
config:
replication.factor: 3
offset-syncs.topic.replication.factor: 3
topicsPattern: "payu.*"
π° Cloud FinOps
1. Resource Right-Sizing with VPA
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: wallet-service-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: wallet-service
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 100m
memory: 256Mi
maxAllowed:
cpu: 2
memory: 4Gi
2. Cost Attribution Labels
# Required labels for all resources
metadata:
labels:
app.kubernetes.io/name: wallet-service
app.kubernetes.io/version: "1.2.3"
app.kubernetes.io/component: backend
app.kubernetes.io/part-of: payu-platform
cost-center: platform-team
environment: prod
owner: wallet-team
π Container Build Debugging (Podman/UBI9)
Learned from: E2E test infrastructure setup - February 1, 2026
Common Build Failure Patterns
1. Parent POM Resolution Failure
Symptom: Maven build fails with Could not resolve dependencies or parent POM not found
Root Cause: Containerfile copies only service pom.xml, but Spring Boot services reference parent POM at ../pom.xml
# β WRONG - Only copies service pom.xml
COPY pom.xml ./
RUN mvn dependency:go-offline -B
COPY src ./src
# β
CORRECT - Copies entire project for parent POM access
COPY . .
RUN mvn clean package -DskipTests
Fix: Change COPY pom.xml ./ to COPY . . in Containerfiles
2. Maven Build Hanging (4+ hours)
Symptom: Maven build process hangs indefinitely during dependency download or compilation
Root Cause:
- Parallel builds (
-T 1C) causing deadlock in certain services - Network issues accessing Maven Central during container build
- Large dependency downloads timing out
Fix - Use Pre-Built JARs:
# Build stage: Skip Maven, use pre-built JAR
# Runtime stage only
FROM registry.access.redhat.com/ubi9/openjdk-21-runtime:1.24-2
# Copy pre-built JAR from local build
COPY target/*.jar /app/app.jar
USER 1001
ENTRYPOINT ["java", "-jar", "/app/app.jar"]
Build Strategy:
-
Build all JARs first with Maven from backend directory:
cd /home/ubuntu/payu/backend mvn clean package -DskipTests -T 1C -
Create runtime-only Containerfiles that copy pre-built JARs
-
Build images much faster (minutes vs hours)
3. UBI9 Runtime Image Conflicts
Symptom: curl-minimal conflicts when trying to install curl
Root Cause: UBI9 runtime images have curl-minimal pre-installed, conflicts with installing regular curl
Fix: Remove curl installation and curl-based health checks from Containerfiles, or use curl-minimal for health checks:
# β WRONG - Tries to install curl (conflicts)
RUN microdnf install -y curl
# β
CORRECT - curl-minimal already available
HEALTHCHECK CMD curl-minimal --fail-with-body http://localhost:8080/actuator/health || exit 1
4. User Creation Conflicts (GID 185)
Symptom: groupadd: GID '185' already exists when creating non-root user
Root Cause: UBI9 images already have user jboss with GID 185
Fix: Use existing jboss user (UID 185) instead of creating new user:
# β WRONG - Tries to create user with GID 185
RUN groupadd -r payu -g 1001 && \
useradd -r -g payu -u 1001 -d /app payu
# β
CORRECT - Use existing jboss user
USER 185
5. Dockerfile Excludes Target Directory
Symptom: COPY target/*.jar /app/app.jar fails with "no such file or directory"
Root Cause: .dockerignore or .containerignore excludes target/ directory
Fix: Either:
- Build from parent directory with proper context
- Remove
target/from ignore files - Use
--ignorefile=.containerignoreto bypass dockerignore
Debugging Commands
# Check if parent POM is accessible
cd backend/some-service
cat ../pom.xml # Should show parent POM content
# Check Maven can resolve parent
mvn help:evaluate -Dexpression=project.parentGroupId
mvn help:evaluate -Dexpression=project.parentArtifactId
# Check what's in target directory
ls -la target/ | grep -E "\.jar$"
# Test Maven build locally (without container)
mvn clean package -DskipTests
# Check dockerignore
cat .dockerignore | grep target
Build Performance Optimization
| Strategy | Build Time | Disk Space | Use When |
|---|---|---|---|
| Full container build | 10-30 min/service | High | Initial setup, CI/CD |
| Pre-built JARs | 1-2 min/service | Medium | Development, fast iteration |
| Multi-stage with cache | 5-10 min/service | Medium | Production, optimized |
| Runtime-only (local JAR) | <1 min/service | Low | Debugging, testing |
PayU Build Standards
- All Spring Boot services use
payu-backend-parent(not directspring-boot-starter-parent) - Containerfiles use
COPY . .for parent POM resolution - Non-root user with UID 1001 or existing
jbossuser (185) - UBI9 images:
ubi9/openjdk-21:1.24-2for build,ubi9/openjdk-21-runtime:1.24-2for runtime - Node.js images:
ubi9/nodejs-20:9.7for frontend
Known Working Services
| Service | Image | Build Method |
|---|---|---|
| account-service | β payu-account-service:test | Pre-built JAR |
| auth-service | β payu-auth-service:test | Pre-built JAR |
| wallet-service | β payu-wallet-service:test | Pre-built JAR |
| transaction-service | β payu-transaction-service:test | Pre-built JAR |
| investment-service | β payu-investment-service:test | Pre-built JAR |
| gateway-service | β payu-gateway-service:test | Pre-built JAR |
| bi-fast-simulator | β payu-bifast-simulator:test | Pre-built JAR |
| dukcapil-simulator | β payu-dukcapil-simulator:test | Full build |
| qris-simulator | β payu-qris-simulator:test | Pre-built JAR |
References
π‘οΈ Platform Integrity Checklist
Security
-
Containerfile menggunakan UBI9-minimal dan non-root USER
-
Dijalankan menggunakan Podman rootless (UID 1001)
-
SecurityContext drops all capabilities
-
NetworkPolicies isolate service traffic
-
Secrets managed via Vault + External Secrets Operator (not Git)
Delivery
-
Service deployed via ArgoCD (GitOps)
-
Sync windows configured for production
-
Automated rollback enabled
-
Tekton pipeline includes secret scan, SAST/SCA, SBOM, vulnerability gating, and Cosign signing
Observability
-
PodMonitor/ServiceMonitor configured
-
Distributed tracing enabled (Jaeger/OpenTelemetry)
-
Log aggregation configured (Loki)
-
eBPF probes enabled for network visibility
Resilience
-
PodDisruptionBudget defined
-
HPA configured with appropriate thresholds
-
Multi-region DR tested quarterly
-
Chaos testing run in SIT automatically
π References
Merged Skill References (Consolidated)
| Category | Topic | File |
|---|---|---|
| Releases | Feature Flags, Progressive Rollouts, Blue-Green/Canary | release-engineering.md |
| SRE | Observability, SLO/SLI, Chaos Engineering, DR | sre-practices.md |
| K8s | Kubernetes manifest generator patterns | k8s-manifest-generator.md |
External Documentation
- OpenShift Documentation
- ArgoCD Documentation
- Tekton Documentation
- Helm Documentation
- Istio Documentation
- Strimzi Kafka Operator
- UBI9 Container Guide
- Kubernetes Security Best Practices
- CNCF Landscape
- FinOps Foundation
- Google SRE Book
- LaunchDarkly Feature Flags
Last Updated: January 2026