Skip to content

Resource Exhaustion: The Incident Pattern That Looks Different Every Time

Author:
John Jamie | Jun 20, 2026
Topics

Share This:

Connection pool hit zero. Memory leaked to the ceiling. Disk filled with logs overnight. GPU capacity ran out under inference load.

Resource exhaustion (FM-13) looks different on the surface every time \u2014 CPU, memory, disk, connections, tokens, GPU compute \u2014 but the shape is always the same: a finite resource consumed faster than it can be replenished. It's 11% of classified unplanned incidents in 2025, and one of the most automatable patterns in the taxonomy.

Full data: stackgen.com/state-of-reliability.

Two Sub-Patterns, Different Responses

Demand Saturation: traffic spike exceeds capacity. The resource is correctly sized; the load grew unexpectedly. Response: scale out, expand capacity, increase quota.

Resource Leak / Runaway Consumption: pathological growth uncorrelated with traffic \u2014 memory leak, queue runaway, log disk fill. Response: restart and fix. Confusing these two leads to adding capacity to a leaking system, watching it fill up again.

Where FM-13 Appears in the Data

  • Communications / infrastructure services: high connection-count services (Twilio, Bandwidth, Sinch) show FM-13 frequently, particularly connection pool exhaustion under traffic spikes
  • AI model providers: GPU capacity exhaustion under inference load is an emerging FM-13 variant. When an AI service can't handle inference load and returns degraded quality, that crosses into FM-17. When it simply fails requests, it stays FM-13.
  • Crypto operators: queue exhaustion during high transaction volume periods (major market moves, new chain launches) appears repeatedly in Kraken, Coinbase, and Luno incident histories

Why FM-13 Is the Highest-Automation Pattern

  1. Detection: resource utilization chart hits ceiling (CPU 100%, connections 0, disk 100%, OOM events)
  2. Diagnosis: demand saturation or leak? (monotonic growth vs. traffic-correlated spike)
  3. Remediation: demand \u2192 scale out. Leak \u2192 restart + fix.

All three steps are automatable from standard telemetry. No business-logic judgment required for the initial response. This is why FM-13 is Tier C \u2014 AI-Closed \u2014 in the autonomy framework.

Prevention: Alert Early

  • Alert at 75%, not 95%: gives time to respond proactively before the incident
  • Track P99 resource consumption, not just mean: peak consumption is what causes exhaustion
  • Separate demand saturation and leak alerting: monotonic growth over 24 hours is a different signal from a traffic-correlated spike

Key Takeaways

  • 11% of 2025 incidents \u2014 high frequency, high automation potential
  • Two sub-patterns require different responses: confusing demand saturation with leaks makes incidents worse
  • AI inference is an emerging FM-13 vector: GPU capacity exhaustion growing as AI features embed in products
  • FM-13 is Tier C (AI-Closed) \u2014 the most automatable pattern in the taxonomy

stackgen.com/state-of-reliability | LinkedIn webinar

About StackGen:

StackGen is the pioneer in Autonomous Infrastructure Platform (AIP) technology, helping enterprises transition from manual Infrastructure-as-Code (IaC) management to fully autonomous operations. Founded by infrastructure automation experts and headquartered in the San Francisco Bay Area, StackGen serves leading companies across technology, financial services, manufacturing, and entertainment industries.

All

Start typing to search...