TestForge | 📊 Plogger ✍️ Blog 📚 Docs
TestForge Blog

AI DevOps Korea

A practical hub for operating and improving AI services

Aidevops.kr organizes LLMOps, RAG, agents, evaluation, observability, and cost-performance tuning for teams running AI in production.

← All Tags

#troubleshooting

9 articles

Kafka Consumer Lag Incident Analysis — Where to Look First When Backlog Grows

When Kafka Consumer Lag spikes, simply scaling consumers is often not enough. This post walks through practical incident analysis: distinguishing broker issues from consumer issues, checking partition imbalance, spotting retry storms, and finding downstream bottlenecks that actually caused the lag.