Kubernetes Autoscaling Complete Guide

Kubernetes Autoscaling Types

Type	Role	Trigger
HPA	Adjusts Pod count	CPU, memory, custom metrics
VPA	Adjusts Pod resource requests	Actual usage
KEDA	Event-driven Pod scaling	Queue depth, DB row count, etc.
Cluster Autoscaler	Adjusts Node count	Unschedulable Pods

1. HPA (Horizontal Pod Autoscaler)

Basic CPU-Based Setup

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60  # Target: keep CPU at 60%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # Wait 1 minute before scaling up
      policies:
      - type: Pods
        value: 4             # Add at most 4 Pods at once
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 minutes before scaling down
      policies:
      - type: Percent
        value: 20            # Remove at most 20% at once
        periodSeconds: 60

Important: HPA requires resources.requests to be set on every Pod.

resources:
  requests:
    cpu: "200m"     # HPA baseline reference
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "512Mi"

Custom Metrics (Prometheus Adapter)

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "1000"  # Maintain 1000 req/sec per Pod

2. VPA (Vertical Pod Autoscaler)

Unlike HPA, VPA automatically adjusts CPU/memory requests.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Recommendations only (Auto mode causes Pod restarts)
  resourcePolicy:
    containerPolicies:
    - containerName: my-app
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "4"
        memory: "4Gi"

# Check VPA recommendations
kubectl describe vpa my-app-vpa
# Recommendation:
#   Container Name: my-app
#   Lower Bound:    cpu: 100m  memory: 128Mi
#   Target:         cpu: 350m  memory: 512Mi  ← tune requests to this
#   Upper Bound:    cpu: 2     memory: 2Gi

Note: Using HPA and VPA with CPU at the same time causes conflicts.
→ Separate roles: HPA for custom metrics, VPA for resource sizing.

3. KEDA (Event-Driven Scaling)

# Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

Scale Based on Kafka Queue

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
spec:
  scaleTargetRef:
    name: kafka-consumer
  minReplicaCount: 1
  maxReplicaCount: 30
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka:9092
      consumerGroup: my-group
      topic: orders
      lagThreshold: "100"    # Keep lag below 100 per partition

Cron-Based (Predictive Scaling)

triggers:
- type: cron
  metadata:
    timezone: Asia/Seoul
    start: "0 9 * * 1-5"    # Weekdays 9am
    end: "0 22 * * 1-5"     # Weekdays 10pm
    desiredReplicas: "10"   # 10 Pods during business hours

4. Cluster Autoscaler

Automatically adds Nodes by calling cloud APIs when capacity is insufficient.

# Cluster Autoscaler for EKS
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --set autoDiscovery.clusterName=my-cluster \
  --set awsRegion=ap-northeast-2 \
  --set extraArgs.balance-similar-node-groups=true \
  --set extraArgs.skip-nodes-with-system-pods=false \
  --namespace kube-system

On scale-down, PodDisruptionBudget is respected — always configure PDB.

Recommended Strategy by Traffic Pattern

API servers (CPU-proportional):
→ HPA (CPU 60% target) + Cluster Autoscaler

Batch processing (queue-connected):
→ KEDA (Kafka/SQS lag threshold) + Cluster Autoscaler

DB / Cache (memory-intensive):
→ VPA (Manual or Off mode for recommendations only)

Predictable peak traffic:
→ KEDA Cron trigger to pre-scale ahead of peak

Verifying Scaling Behavior

# HPA status
kubectl get hpa my-app-hpa
# NAME          REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS
# my-app-hpa    Deployment/my-app    45%/60%   2         20        5

# Real-time watch
kubectl get hpa my-app-hpa -w

# Scaling events
kubectl describe hpa my-app-hpa | grep -A20 Events

A practical hub for operating and improving AI services

Kubernetes Autoscaling Complete Guide — HPA, VPA, and KEDA