Kubernetes Autoscaling 설정 완벽 가이드

Kubernetes Autoscaling 종류

종류	역할	기준
HPA	Pod 수 조정	CPU, 메모리, 커스텀 메트릭
VPA	Pod 리소스 조정	실제 사용량
KEDA	이벤트 기반 Pod 수 조정	큐 길이, DB 행 수 등
Cluster Autoscaler	Node 수 조정	스케줄 불가 Pod 존재 시

1. HPA (Horizontal Pod Autoscaler)

CPU 기반 기본 설정

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60  # CPU 60% 유지
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # 1분 안정화 후 스케일업
      policies:
      - type: Pods
        value: 4             # 한 번에 최대 4개 추가
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # 5분 안정화 후 스케일다운
      policies:
      - type: Percent
        value: 20            # 한 번에 최대 20% 감소
        periodSeconds: 60

중요: HPA가 동작하려면 반드시 resources.requests가 설정되어 있어야 합니다.

resources:
  requests:
    cpu: "200m"     # HPA의 기준점
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "512Mi"

커스텀 메트릭 기반 (Prometheus Adapter)

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "1000"  # Pod당 초당 1000 요청 유지

2. VPA (Vertical Pod Autoscaler)

HPA와 달리 CPU/메모리 requests를 자동으로 조정합니다.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # 추천만 (Auto는 Pod 재시작 발생)
  resourcePolicy:
    containerPolicies:
    - containerName: my-app
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "4"
        memory: "4Gi"

# VPA 추천 확인
kubectl describe vpa my-app-vpa
# Recommendation:
#   Container Name: my-app
#   Lower Bound:    cpu: 100m  memory: 128Mi
#   Target:         cpu: 350m  memory: 512Mi  ← 이 값으로 requests 조정
#   Upper Bound:    cpu: 2     memory: 2Gi

주의: HPA와 VPA를 CPU 기준으로 동시 사용하면 충돌합니다.
→ HPA는 커스텀 메트릭, VPA는 리소스 조정으로 역할 분리.

3. KEDA (이벤트 기반 스케일링)

# KEDA 설치
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

Kafka 큐 기반 스케일링

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
spec:
  scaleTargetRef:
    name: kafka-consumer
  minReplicaCount: 1
  maxReplicaCount: 30
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka:9092
      consumerGroup: my-group
      topic: orders
      lagThreshold: "100"    # 파티션당 lag 100 이하 유지

Cron 기반 (예측 스케일링)

triggers:
- type: cron
  metadata:
    timezone: Asia/Seoul
    start: "0 9 * * 1-5"   # 평일 오전 9시
    end: "0 22 * * 1-5"    # 평일 오후 10시
    desiredReplicas: "10"  # 업무 시간 10개

4. Cluster Autoscaler

Node 부족 시 클라우드 API 호출로 Node 자동 추가.

# EKS용 Cluster Autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --set autoDiscovery.clusterName=my-cluster \
  --set awsRegion=ap-northeast-2 \
  --set extraArgs.balance-similar-node-groups=true \
  --set extraArgs.skip-nodes-with-system-pods=false \
  --namespace kube-system

스케일다운 시 PodDisruptionBudget을 존중하므로, PDB 설정 필수.

실제 적용 전략

트래픽 패턴별 추천:

API 서버 (CPU 비례):
→ HPA (CPU 60% 기준) + Cluster Autoscaler

배치 처리 (큐 연동):
→ KEDA (Kafka/SQS lag 기준) + Cluster Autoscaler

DB/캐시 (메모리 집약):
→ VPA (Manual 또는 Off 모드로 추천만 확인)

피크 타임 예측 가능:
→ KEDA Cron 트리거로 미리 스케일업

스케일링 동작 확인

# HPA 상태
kubectl get hpa my-app-hpa
# NAME          REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS
# my-app-hpa    Deployment/my-app    45%/60%   2         20        5

# 실시간 모니터링
kubectl get hpa my-app-hpa -w

# 스케일링 이벤트
kubectl describe hpa my-app-hpa | grep -A20 Events

AI 서비스 운영과 성능개선을 위한 실전 허브

Kubernetes Autoscaling 설정 완벽 가이드 — HPA, VPA, KEDA