Kubernetes at Scale: Running Production Workloads on AKS

Kubernetes has become the de facto standard for container orchestration, but running it in production at scale is a different challenge from running it in dev. This guide shares our battle-tested patterns for deploying, scaling, securing, and observing production workloads on Azure Kubernetes Service (AKS).

Production Cluster Architecture

Enterprise AKS cluster topology

🌐

Ingress
NGINX / App GW

→

📦

Services
API / Web / Worker

→

🗃

Data Layer
DB / Cache / Queue

🛡

RBAC
AAD-integrated

📈

Monitoring
Prometheus

📜

Logging
Fluentbit

🔒

Secrets
Key Vault CSI

Node Pool Strategy

Production clusters should use multiple node pools to isolate workloads by resource requirements and criticality:

💻

System Pool

CoreDNS, kube-proxy
Standard_D4s_v5
3 nodes (fixed)

🚀

Application Pool

API + Web services
Standard_D8s_v5
3-20 nodes (auto)

⚡

GPU / ML Pool

ML inference jobs
Standard_NC6s_v3
0-5 nodes (spot)

Autoscaling: Three Layers Deep

Horizontal Pod Autoscaler (HPA)

Scale pods based on CPU, memory, or custom metrics. Set target utilisation to 70% for CPU-bound workloads. Use KEDA (Kubernetes Event-Driven Autoscaling) for queue-based workloads - scale to zero when idle, scale up based on message count.

Vertical Pod Autoscaler (VPA)

Automatically right-size pod resource requests based on actual usage. Run VPA in "recommendation" mode first to understand resource patterns, then enable "auto" mode. This prevents over-provisioning and reduces cluster costs by 20-35%.

Cluster Autoscaler

Automatically adds or removes nodes based on pending pod scheduling. Configure scale-down delay to 10 minutes to avoid thrashing. Use spot instances for non-critical workloads to cut compute costs by 60-80%.

Security Hardening Checklist

Azure AD integration - Use AAD for cluster authentication, no static credentials
Pod Security Standards - Enforce "restricted" profile, block privileged containers
Network Policies - Default-deny all traffic, explicitly allow required flows (Calico or Azure CNI)
Image scanning - Scan all images in ACR with Defender for Containers before deployment
Secrets management - Mount secrets from Azure Key Vault using the CSI driver, never store in YAML
Private cluster - Disable public API server endpoint; access only via VNet or Private Link
Workload Identity - Use Azure Workload Identity (federated credentials) instead of service principals

GitOps Deployment Pipeline

Flux CD-based GitOps deployment flow

📝

Git Push
Helm values

→

🔧

CI Build
Test + Image

→

📦

ACR Push
Container Registry

→

🚀

Flux Sync
Auto-deploy

Sample HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-gateway
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300

"Kubernetes gives you superpowers, but with great power comes great YAML. Invest in your platform engineering team and they'll make every application team faster."

Need help with Kubernetes in production?

Our platform engineers design, deploy, and operate AKS clusters for enterprise workloads.

Talk to Our Platform Team