Blog

Resource Exhaustion: The Incident Pattern That Looks Different Every Time

Written by John Jamie | Jun 20, 2026 2:00:57 AM

Connection pool hit zero. Memory leaked to the ceiling. Disk filled with logs overnight. GPU capacity ran out under inference load.

Resource exhaustion (FM-13) looks different on the surface every time \u2014 CPU, memory, disk, connections, tokens, GPU compute \u2014 but the shape is always the same: a finite resource consumed faster than it can be replenished. It's 11% of classified unplanned incidents in 2025, and one of the most automatable patterns in the taxonomy.

Full data: stackgen.com/state-of-reliability.

Two Sub-Patterns, Different Responses

Demand Saturation: traffic spike exceeds capacity. The resource is correctly sized; the load grew unexpectedly. Response: scale out, expand capacity, increase quota.

Resource Leak / Runaway Consumption: pathological growth uncorrelated with traffic \u2014 memory leak, queue runaway, log disk fill. Response: restart and fix. Confusing these two leads to adding capacity to a leaking system, watching it fill up again.

Where FM-13 Appears in the Data

  • Communications / infrastructure services: high connection-count services (Twilio, Bandwidth, Sinch) show FM-13 frequently, particularly connection pool exhaustion under traffic spikes
  • AI model providers: GPU capacity exhaustion under inference load is an emerging FM-13 variant. When an AI service can't handle inference load and returns degraded quality, that crosses into FM-17. When it simply fails requests, it stays FM-13.
  • Crypto operators: queue exhaustion during high transaction volume periods (major market moves, new chain launches) appears repeatedly in Kraken, Coinbase, and Luno incident histories

Why FM-13 Is the Highest-Automation Pattern

  1. Detection: resource utilization chart hits ceiling (CPU 100%, connections 0, disk 100%, OOM events)
  2. Diagnosis: demand saturation or leak? (monotonic growth vs. traffic-correlated spike)
  3. Remediation: demand \u2192 scale out. Leak \u2192 restart + fix.

All three steps are automatable from standard telemetry. No business-logic judgment required for the initial response. This is why FM-13 is Tier C \u2014 AI-Closed \u2014 in the autonomy framework.

Prevention: Alert Early

  • Alert at 75%, not 95%: gives time to respond proactively before the incident
  • Track P99 resource consumption, not just mean: peak consumption is what causes exhaustion
  • Separate demand saturation and leak alerting: monotonic growth over 24 hours is a different signal from a traffic-correlated spike

Key Takeaways

  • 11% of 2025 incidents \u2014 high frequency, high automation potential
  • Two sub-patterns require different responses: confusing demand saturation with leaks makes incidents worse
  • AI inference is an emerging FM-13 vector: GPU capacity exhaustion growing as AI features embed in products
  • FM-13 is Tier C (AI-Closed) \u2014 the most automatable pattern in the taxonomy

stackgen.com/state-of-reliability | LinkedIn webinar