Kubernetes Autoscaling Complete Guide — HPA, VPA, and KEDA
How to configure Kubernetes HPA, VPA, KEDA, and Cluster Autoscaler, and when to use each. From CPU/memory-based to custom metrics — with real-world configuration examples.
TestForge Team ·
Kubernetes Autoscaling Types
| Type | Role | Trigger |
|---|---|---|
| HPA | Adjusts Pod count | CPU, memory, custom metrics |
| VPA | Adjusts Pod resource requests | Actual usage |
| KEDA | Event-driven Pod scaling | Queue depth, DB row count, etc. |
| Cluster Autoscaler | Adjusts Node count | Unschedulable Pods |
1. HPA (Horizontal Pod Autoscaler)
Basic CPU-Based Setup
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60 # Target: keep CPU at 60%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 1 minute before scaling up
policies:
- type: Pods
value: 4 # Add at most 4 Pods at once
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 minutes before scaling down
policies:
- type: Percent
value: 20 # Remove at most 20% at once
periodSeconds: 60
Important: HPA requires resources.requests to be set on every Pod.
resources:
requests:
cpu: "200m" # HPA baseline reference
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
Custom Metrics (Prometheus Adapter)
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000" # Maintain 1000 req/sec per Pod
2. VPA (Vertical Pod Autoscaler)
Unlike HPA, VPA automatically adjusts CPU/memory requests.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Recommendations only (Auto mode causes Pod restarts)
resourcePolicy:
containerPolicies:
- containerName: my-app
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "4"
memory: "4Gi"
# Check VPA recommendations
kubectl describe vpa my-app-vpa
# Recommendation:
# Container Name: my-app
# Lower Bound: cpu: 100m memory: 128Mi
# Target: cpu: 350m memory: 512Mi ← tune requests to this
# Upper Bound: cpu: 2 memory: 2Gi
Note: Using HPA and VPA with CPU at the same time causes conflicts.
→ Separate roles: HPA for custom metrics, VPA for resource sizing.
3. KEDA (Event-Driven Scaling)
# Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
Scale Based on Kafka Queue
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
spec:
scaleTargetRef:
name: kafka-consumer
minReplicaCount: 1
maxReplicaCount: 30
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: my-group
topic: orders
lagThreshold: "100" # Keep lag below 100 per partition
Cron-Based (Predictive Scaling)
triggers:
- type: cron
metadata:
timezone: Asia/Seoul
start: "0 9 * * 1-5" # Weekdays 9am
end: "0 22 * * 1-5" # Weekdays 10pm
desiredReplicas: "10" # 10 Pods during business hours
4. Cluster Autoscaler
Automatically adds Nodes by calling cloud APIs when capacity is insufficient.
# Cluster Autoscaler for EKS
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--set autoDiscovery.clusterName=my-cluster \
--set awsRegion=ap-northeast-2 \
--set extraArgs.balance-similar-node-groups=true \
--set extraArgs.skip-nodes-with-system-pods=false \
--namespace kube-system
On scale-down, PodDisruptionBudget is respected — always configure PDB.
Recommended Strategy by Traffic Pattern
API servers (CPU-proportional):
→ HPA (CPU 60% target) + Cluster Autoscaler
Batch processing (queue-connected):
→ KEDA (Kafka/SQS lag threshold) + Cluster Autoscaler
DB / Cache (memory-intensive):
→ VPA (Manual or Off mode for recommendations only)
Predictable peak traffic:
→ KEDA Cron trigger to pre-scale ahead of peak
Verifying Scaling Behavior
# HPA status
kubectl get hpa my-app-hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# my-app-hpa Deployment/my-app 45%/60% 2 20 5
# Real-time watch
kubectl get hpa my-app-hpa -w
# Scaling events
kubectl describe hpa my-app-hpa | grep -A20 Events