Kubernetes 개발·운영 실전 3편 — 워크로드 패턴 선택 가이드

워크로드 선택이 중요한 이유

잘못된 워크로드를 선택하면 문제가 발생합니다. StatefulSet이어야 할 DB를 Deployment로 올리면 Pod 재시작 시 데이터가 다른 볼륨에 붙거나 동일 노드에 두 인스턴스가 같은 데이터를 쓰는 split-brain이 발생합니다. 반대로 stateless 앱을 StatefulSet으로 올리면 불필요한 복잡성이 생깁니다.

1. 워크로드 종류 한눈에 보기

워크로드	용도	핵심 특성
Deployment	무상태 서비스	롤링 업데이트, 스케일 아웃
StatefulSet	상태 있는 서비스	안정적 네트워크 ID, 순서 보장
DaemonSet	노드별 에이전트	모든(또는 일부) 노드에 1개씩
Job	일회성 작업	완료 후 종료, 재시도 설정
CronJob	주기적 작업	Cron 스케줄 기반 Job 생성

2. Deployment — 무상태 서비스의 기본

대부분의 웹 서버, API 서버, 워커에 사용합니다.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # 롤링 중 추가 허용 Pod 수
      maxUnavailable: 0  # 롤링 중 다운 허용 Pod 수 (0 = 무중단)
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
        - name: api
          image: my-api:1.2.3
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: "200m"
              memory: 256Mi
            limits:
              cpu: "500m"
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
      terminationGracePeriodSeconds: 30

핵심 설정 포인트

maxUnavailable: 0 + maxSurge: 1: 완전 무중단 롤링 업데이트
readinessProbe: 준비된 Pod에만 트래픽 전달
terminationGracePeriodSeconds: in-flight 요청 처리 후 종료

3. StatefulSet — 상태 있는 서비스

Redis, Kafka, Elasticsearch, PostgreSQL 같은 상태 있는 서비스에 사용합니다.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
  namespace: production
spec:
  serviceName: redis-headless  # Headless Service 필수
  replicas: 3
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:7.2
          ports:
            - containerPort: 6379
          volumeMounts:
            - name: data
              mountPath: /data
          command: ["redis-server", "--appendonly", "yes"]
  volumeClaimTemplates:  # Pod마다 고유한 PVC 생성
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 10Gi

Headless Service — StatefulSet 필수 동반자

apiVersion: v1
kind: Service
metadata:
  name: redis-headless
  namespace: production
spec:
  clusterIP: None  # Headless: DNS로 직접 Pod IP 조회
  selector:
    app: redis
  ports:
    - port: 6379

Headless Service를 사용하면 redis-0.redis-headless.production.svc.cluster.local 형태로 각 Pod에 직접 접근할 수 있어 클러스터 구성에 필수입니다.

Deployment vs StatefulSet 선택 기준

기준	Deployment	StatefulSet
Pod 재시작 시 동일 볼륨 필요	❌	✅
Pod 간 순서(primary/replica) 필요	❌	✅
안정적인 DNS 이름 필요	❌	✅
스케일 아웃이 자주 발생	✅	△ (주의)

4. DaemonSet — 노드마다 하나씩

로그 수집기(Fluent Bit), 노드 모니터링 에이전트(node-exporter), 네트워크 플러그인처럼 모든 노드에 반드시 하나씩 있어야 하는 컴포넌트에 사용합니다.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: platform
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule  # 컨트롤 플레인 노드에도 배포
      containers:
        - name: fluent-bit
          image: fluent/fluent-bit:3.0
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

특정 노드에만 배포 — nodeSelector

spec:
  template:
    spec:
      nodeSelector:
        role: gpu-node  # GPU 노드에만 배포 (GPU 드라이버 데몬 등)

5. Job — 일회성 배치 작업

DB 마이그레이션, 데이터 정제, 초기 시드 데이터 입력 등에 사용합니다.

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  namespace: production
spec:
  backoffLimit: 3        # 실패 시 최대 재시도 횟수
  activeDeadlineSeconds: 600  # 10분 내 완료하지 않으면 종료
  template:
    spec:
      restartPolicy: Never  # Job은 Never 또는 OnFailure
      containers:
        - name: migration
          image: my-app:1.2.3
          command: ["python", "manage.py", "migrate"]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: url

병렬 Job — 대용량 데이터 처리

spec:
  completions: 10      # 총 10개 작업 완료 필요
  parallelism: 3       # 동시에 3개 실행
  backoffLimit: 5

6. CronJob — 주기적 배치 작업

apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-report
  namespace: production
spec:
  schedule: "0 9 * * 1-5"  # 평일 오전 9시
  timeZone: "Asia/Seoul"
  concurrencyPolicy: Forbid  # 이전 작업 실행 중이면 스킵
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: reporter
              image: my-reporter:latest
              command: ["python", "generate_report.py"]

concurrencyPolicy 선택

값	동작
Allow	이전 작업 실행 중에도 새 작업 시작 (기본)
Forbid	이전 작업 실행 중이면 새 작업 건너뜀
Replace	이전 작업 종료 후 새 작업 시작

긴 배치 작업에는 Forbid를 사용해 중복 실행을 방지합니다.

7. Init Container — 메인 컨테이너 시작 전 준비

spec:
  initContainers:
    - name: wait-for-db
      image: busybox
      command:
        - sh
        - -c
        - |
          until nc -z postgres-service 5432; do
            echo "Waiting for database..."; sleep 2
          done
    - name: run-migration
      image: my-app:1.2.3
      command: ["python", "manage.py", "migrate"]
  containers:
    - name: app
      image: my-app:1.2.3

Init Container는 순서대로 실행되며, 하나라도 실패하면 메인 컨테이너가 시작되지 않습니다.

정리

서비스 유형 결정 트리

무상태 서비스? → Deployment
상태/순서 필요? → StatefulSet
모든 노드에 배포? → DaemonSet
일회성 작업? → Job
주기적 작업? → CronJob

다음 편에서는 네트워크 설계 실전을 다룹니다. Service 타입별 차이, Ingress 구성, NetworkPolicy로 서비스 간 통신을 제어하는 방법을 정리합니다.

AI 서비스 운영과 성능개선을 위한 실전 허브