TestForge | 📊 Plogger ✍️ Blog 📚 Docs
TestForge Blog

AI DevOps Korea

A practical hub for operating and improving AI services

Aidevops.kr organizes LLMOps, RAG, agents, evaluation, observability, and cost-performance tuning for teams running AI in production.

← All Posts

JVM Options Tuning for Production — Complete Guide from GC to Memory

Step-by-step JVM tuning for Spring Boot production servers. GC algorithm selection, heap sizing, GC logging, OOM response, and container environment pitfalls — all from real-world practice.

TestForge Team ·

Why JVM Tuning?

Java applications run on JVM defaults, but in production you’ll frequently encounter:

  • Full GC causing seconds-long STW (Stop-The-World) → user timeouts
  • OOM (OutOfMemoryError) → sudden Pod restarts
  • No GC logging → impossible to diagnose incidents
  • Container memory limit exceeded → OOMKilled

Tuning isn’t about blindly adding flags.
It’s a measure → analyze → adjust → re-measure cycle.


Understanding JVM Option Syntax

java [standard options] [non-standard options] -jar app.jar

-X  : Non-standard (may vary by JVM implementation)
-XX : Advanced runtime options (core of production tuning)
-XX:+OptionName  → enable
-XX:-OptionName  → disable
-XX:OptionName=value → set value

1. Heap Memory Configuration

Basic Settings

java -jar app.jar \
  -Xms512m \    # Initial heap size (minimum)
  -Xmx2g        # Maximum heap size

Why Set Xms = Xmx

-Xms2g -Xmx2g

When Xms < Xmx, the JVM dynamically grows and shrinks the heap.
Each expansion incurs OS memory allocation overhead + GC → latency spikes.

In production, fix Xms = Xmx for predictable, stable behavior.

Sizing Formula

Total JVM memory ≈ Heap + Metaspace + Thread Stacks + Off-Heap + JIT code cache

Heap = 70–75% of total is the standard recommendation

Example: 8GB server RAM:
  -Xms6g -Xmx6g
  Remaining 2GB: OS + Metaspace + other

Metaspace Configuration

Java 8’s PermGen was replaced by Metaspace.
Default is unlimited → class leaks can exhaust all server memory.

-XX:MetaspaceSize=256m        # Initial Metaspace size
-XX:MaxMetaspaceSize=512m     # Maximum limit (always set this)

2. GC Algorithm Selection

GC Algorithm Comparison

GCIntroducedCharacteristicsBest For
Serial GCJava 1Single thread, STWSingle-core, embedded
Parallel GCJava 1.4Multi-thread, throughput-firstBatch, bulk processing
G1 GCJava 7 (default 9+)Predictable pauseStandard for API servers
ZGCJava 11+< 1ms pause, large heapUltra-low latency
ShenandoahJava 12+Similar to ZGC, Red HatUltra-low latency
java -jar app.jar \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=200 \         # Target pause time (ms) — set per SLA
  -XX:G1HeapRegionSize=16m \         # Region size (1–32MB, Heap/2048 recommended)
  -XX:G1NewSizePercent=20 \          # Minimum Young generation ratio
  -XX:G1MaxNewSizePercent=40 \       # Maximum Young generation ratio
  -XX:G1MixedGCCountTarget=8 \       # Mixed GC count (lower = more aggressive reclaim)
  -XX:InitiatingHeapOccupancyPercent=45 \  # Old GC trigger threshold (default 45%)
  -XX:+ParallelRefProcEnabled \      # Parallelize reference processing
  -XX:+G1EagerReclaimHumongousObjects   # Early reclaim of large objects (Java 12+)

ZGC (Ultra-Low Latency / 8GB+ Heap)

java -jar app.jar \
  -XX:+UseZGC \
  -XX:+ZGenerational \              # Java 21+ Generational ZGC (recommended)
  -XX:MaxGCPauseMillis=10 \         # Target pause < 10ms
  -XX:ConcGCThreads=4 \             # Concurrent GC threads
  -Xms16g -Xmx16g

ZGC backfires on small heaps. Only use it with 8GB+ heap environments.


3. GC Logging (Required)

Without GC logs, incident analysis is impossible. Always enable.

# Java 9+ unified logging
-Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=10,filesize=50m
OptionDescription
gc*All GC-related tags
file=/var/log/app/gc.logWrite to file
time,uptime,level,tagsInclude timestamp, elapsed time, level, tags
filecount=10Roll over up to 10 files
filesize=50mMax 50MB per file

Reading GC Logs

[2.345s][info][gc] GC(42) Pause Young (Normal) (G1 Evacuation Pause) 512M->256M(2048M) 18.234ms

GC(42)           : 42nd GC event
Pause Young      : Young GC (fast)
512M->256M       : Heap usage before → after GC
(2048M)          : Total heap size
18.234ms         : STW time (target: < 200ms)

Warning signals:

Pause Full (G1 Compaction Pause)  → Full GC (critical)
To-space exhausted                → Near OOM

4. OOM Response Configuration

-XX:+HeapDumpOnOutOfMemoryError \       # Auto heap dump on OOM
-XX:HeapDumpPath=/var/log/app/heap/ \  # Dump save path
-XX:+ExitOnOutOfMemoryError \           # Exit immediately on OOM (prevent zombie process)
-XX:OnOutOfMemoryError="kill -9 %p" \  # Alternative for older JVMs

Why -XX:+ExitOnOutOfMemoryError matters:
After OOM, if the JVM stays alive, only some threads die — leaving a half-functioning zombie.
In Kubernetes, exit → auto-restart is safer than maintaining a partially dead process.


5. Additional Performance Options

JIT Compilation

-XX:+TieredCompilation \          # Tiered compilation (default on Java 8+)
-XX:ReservedCodeCacheSize=256m \  # JIT code cache size (default 240MB)
-XX:+UseStringDeduplication \    # Deduplicate String objects (G1 only)

Thread Stack Size

-Xss512k    # Stack size per thread (default 512KB–1MB)
            # 1000 threads × 1MB = 1GB of stack memory

API servers without deep recursion can reduce to 256k–512k to free memory.

GC Thread Count

-XX:ParallelGCThreads=8 \     # STW GC threads (usually = CPU cores)
-XX:ConcGCThreads=2           # Concurrent GC threads (ParallelGCThreads / 4)

6. Container (Kubernetes) Pitfalls

Container Memory Awareness

Before Java 8u191, the JVM couldn’t detect container memory limits and used the host’s total RAM to size the heap.

# Java 8u191+, Java 11+ auto-detect container limits
-XX:+UseContainerSupport          # Enable container memory limit awareness (on by default)
-XX:MaxRAMPercentage=75.0         # Use 75% of container memory as heap
-XX:InitialRAMPercentage=50.0     # Initial heap
-XX:MinRAMPercentage=25.0         # Minimum heap

Using MaxRAMPercentage instead of a fixed -Xmx adapts flexibly to Kubernetes memory limits.

Kubernetes Deployment Example

containers:
- name: app
  image: my-app:latest
  resources:
    requests:
      memory: "1Gi"
      cpu: "500m"
    limits:
      memory: "2Gi"      # 75% of this = 1.5GB heap
      cpu: "2000m"
  env:
  - name: JAVA_OPTS
    value: >-
      -XX:+UseContainerSupport
      -XX:MaxRAMPercentage=75.0
      -XX:+UseG1GC
      -XX:MaxGCPauseMillis=200
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:HeapDumpPath=/var/log/heap/
      -XX:+ExitOnOutOfMemoryError
      -Xlog:gc*:file=/var/log/gc/gc.log:time,uptime:filecount=5,filesize=20m

OOMKilled vs OutOfMemoryError

OOMKilled        → Kubernetes killed the container (exceeded memory limit)
                   Heap dump usually NOT generated
                   Fix: increase limits.memory OR decrease MaxRAMPercentage

OutOfMemoryError → JVM-internal OOM
                   Heap dump IS generated
                   Fix: analyze memory leak or increase heap size

API Server (2–4 vCPU, 4–8GB RAM)

java -jar app.jar \
  -Xms2g -Xmx2g \
  -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m \
  -XX:+UseG1GC -XX:MaxGCPauseMillis=200 \
  -XX:G1HeapRegionSize=16m -XX:+ParallelRefProcEnabled \
  -XX:+UseStringDeduplication \
  -XX:ReservedCodeCacheSize=256m \
  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/heap/ \
  -XX:+ExitOnOutOfMemoryError \
  -Xlog:gc*:file=/var/log/gc/gc.log:time,uptime:filecount=10,filesize=50m \
  -Djava.security.egd=file:/dev/./urandom

High-Throughput Batch Server (8+ vCPU, 16GB+ RAM)

java -jar batch.jar \
  -Xms12g -Xmx12g \
  -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m \
  -XX:+UseParallelGC -XX:ParallelGCThreads=16 \
  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/heap/ \
  -XX:+ExitOnOutOfMemoryError \
  -Xlog:gc*:file=/var/log/gc/gc.log:time,uptime:filecount=5,filesize=100m

Ultra-Low Latency (Java 21, 8+ vCPU, 32GB+ RAM)

java -jar app.jar \
  -Xms24g -Xmx24g \
  -XX:+UseZGC -XX:+ZGenerational \
  -XX:MaxGCPauseMillis=10 -XX:ConcGCThreads=4 \
  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/heap/ \
  -XX:+ExitOnOutOfMemoryError \
  -Xlog:gc*:file=/var/log/gc/gc.log:time,uptime:filecount=10,filesize=50m

Kubernetes / Container (Flexible Memory)

java -jar app.jar \
  -XX:+UseContainerSupport \
  -XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0 \
  -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m \
  -XX:+UseG1GC -XX:MaxGCPauseMillis=200 \
  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/heap/ \
  -XX:+ExitOnOutOfMemoryError \
  -Xlog:gc*:file=/var/log/gc/gc.log:time,uptime:filecount=5,filesize=20m

8. Before/After Metrics

Measurement Tools

# GC summary for a running process
jstat -gcutil <PID> 1000 10
# S0   S1   E    O    M    CCS   YGC  YGCT  FGC  FGCT  CGC  CGCT   GCT
# 0.0  0.0  67.5 45.2 96.3  93.1  142  2.123   0  0.000   0  0.000  2.123

# Column guide:
# E  : Eden usage %
# O  : Old usage %
# YGC: Young GC count
# YGCT: Young GC cumulative time (seconds)
# FGC: Full GC count → closer to 0 is better

Healthy Ranges

MetricTargetWarning
Young GC frequency< 1/sec> 5/sec
Young GC pause< 50ms> 200ms
Full GC frequency0≥ 1/hour
Full GC pauseN/A> 1s
Heap usage (post-GC)< 50%> 80%
Metaspace usage< 80%> 95%

Common Questions

Q: Do I need the -server flag?
A: Java 9+ defaults to server mode. No need to specify it.

Q: Should I always use -XX:+DisableExplicitGC?
A: It blocks System.gc() calls. Useful in most apps, but may prevent Off-Heap memory reclaim in apps that heavily use DirectByteBuffer (e.g., Netty). Judge by context.

Q: Is MaxGCPauseMillis=200 guaranteed?
A: It’s a soft target. The JVM tries its best to meet it but makes no guarantee. If heap is full, pauses will exceed the target.

Q: What if I set -Xmx equal to limits.memory in a container?
A: JVM needs additional memory beyond heap — Metaspace, thread stacks, native memory. Set -Xmx to ≤ 75–80% of limits.memory to avoid OOMKilled.