TestForge Blog
← All Posts

EKS Node Group Design Guide — On-Demand, Spot, and System Workloads

A practical guide to EKS node group design. Covers how to separate system nodes, application nodes, and Spot worker nodes using labels, taints, and workload boundaries for better cost and stability.

TestForge Team ·

Node Group Design Strongly Affects EKS Stability and Cost

A common first setup is one node group for everything.

Over time that causes:

  • contention between system and app workloads
  • wider impact from Spot interruptions
  • poor control over cost versus reliability

Node groups are not just infrastructure units. They are policy boundaries.

A Practical Split

Typical production split:

  • system node group
  • app node group
  • Spot worker node group

Example:

system-ng
- CoreDNS
- ingress controller
- cluster autoscaler

app-ng
- APIs
- web services
- standard backends

spot-ng
- batch jobs
- async workers
- interruption-tolerant workloads

Why System Nodes Should Be Separate

Critical cluster services need maximum stability.

If system components share nodes with application spikes, you get:

  • resource contention
  • unstable ingress behavior
  • control-plane-adjacent reliability issues

This is why an On-Demand system node group is often worth it.

Where Spot Nodes Fit

Spot works best for workloads that can survive interruption:

  • queue-driven workers
  • batch jobs
  • restart-tolerant background processes

Less suitable:

  • ingress controllers
  • critical stateful APIs
  • cluster-essential components

Spot is not just cheap compute. It is interruptible compute.

Labels and Taints

Labels alone are often not enough.

Useful pattern:

system-ng
- label: workload=system
- taint: dedicated=system:NoSchedule

spot-ng
- label: lifecycle=spot
- taint: spot=true:NoSchedule

Then workloads opt in with tolerations.

This helps avoid accidental scheduling drift.

Instance Type Strategy

system-ng

  • stable On-Demand
  • modest but predictable sizing

app-ng

  • instance families matched to application profile

spot-ng

  • multiple instance families
  • capacity-optimized strategy

Diversity matters especially in Spot groups.

Autoscaling Must Match the Design

Node groups should also align with:

  • Cluster Autoscaler or Karpenter strategy
  • scale-up speed
  • disruption tolerance
  • pod disruption budgets

Node design and autoscaling cannot really be separated.

Common Mistakes

  • putting all workloads in one node group
  • running ingress on Spot
  • using labels without taints

These usually show up later as operational instability.

Closing Thoughts

Good EKS node group design is really about separating workload classes:

  • system workloads
  • standard services
  • interruption-tolerant workers

That separation improves both reliability and cloud cost control.