Guide

Kubernetes at Scale: Running Production Workloads on AKS

By Hibba Limited · December 2025 · 10 min read

Kubernetes has become the de facto standard for container orchestration, but running it in production at scale is a different challenge from running it in dev. This guide shares our battle-tested patterns for deploying, scaling, securing, and observing production workloads on Azure Kubernetes Service (AKS).

Production Cluster Architecture

Enterprise AKS cluster topology

🌐
Ingress
NGINX / App GW
📦
Services
API / Web / Worker
🗃
Data Layer
DB / Cache / Queue
🛡
RBAC
AAD-integrated
📈
Monitoring
Prometheus
📜
Logging
Fluentbit
🔒
Secrets
Key Vault CSI

Node Pool Strategy

Production clusters should use multiple node pools to isolate workloads by resource requirements and criticality:

💻
System Pool
CoreDNS, kube-proxy
Standard_D4s_v5
3 nodes (fixed)
🚀
Application Pool
API + Web services
Standard_D8s_v5
3-20 nodes (auto)
GPU / ML Pool
ML inference jobs
Standard_NC6s_v3
0-5 nodes (spot)

Autoscaling: Three Layers Deep

1

Horizontal Pod Autoscaler (HPA)

Scale pods based on CPU, memory, or custom metrics. Set target utilisation to 70% for CPU-bound workloads. Use KEDA (Kubernetes Event-Driven Autoscaling) for queue-based workloads - scale to zero when idle, scale up based on message count.

2

Vertical Pod Autoscaler (VPA)

Automatically right-size pod resource requests based on actual usage. Run VPA in "recommendation" mode first to understand resource patterns, then enable "auto" mode. This prevents over-provisioning and reduces cluster costs by 20-35%.

3

Cluster Autoscaler

Automatically adds or removes nodes based on pending pod scheduling. Configure scale-down delay to 10 minutes to avoid thrashing. Use spot instances for non-critical workloads to cut compute costs by 60-80%.

Security Hardening Checklist

GitOps Deployment Pipeline

Flux CD-based GitOps deployment flow

📝
Git Push
Helm values
🔧
CI Build
Test + Image
📦
ACR Push
Container Registry
🚀
Flux Sync
Auto-deploy

Sample HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-gateway
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
"Kubernetes gives you superpowers, but with great power comes great YAML. Invest in your platform engineering team and they'll make every application team faster."

Need help with Kubernetes in production?

Our platform engineers design, deploy, and operate AKS clusters for enterprise workloads.

Talk to Our Platform Team