Resource Exhaustion: The Incident Pattern That Looks Different Every Time

Author:

| Jun 20, 2026

Connection pool hit zero. Memory leaked to the ceiling. Disk filled with logs overnight. GPU capacity ran out under inference load.

Resource exhaustion (FM-13) looks different on the surface every time \u2014 CPU, memory, disk, connections, tokens, GPU compute \u2014 but the shape is always the same: a finite resource consumed faster than it can be replenished. It's 11% of classified unplanned incidents in 2025, and one of the most automatable patterns in the taxonomy.

Full data: stackgen.com/state-of-reliability.

Two Sub-Patterns, Different Responses

Demand Saturation: traffic spike exceeds capacity. The resource is correctly sized; the load grew unexpectedly. Response: scale out, expand capacity, increase quota.

Resource Leak / Runaway Consumption: pathological growth uncorrelated with traffic \u2014 memory leak, queue runaway, log disk fill. Response: restart and fix. Confusing these two leads to adding capacity to a leaking system, watching it fill up again.

Where FM-13 Appears in the Data

Communications / infrastructure services: high connection-count services (Twilio, Bandwidth, Sinch) show FM-13 frequently, particularly connection pool exhaustion under traffic spikes
AI model providers: GPU capacity exhaustion under inference load is an emerging FM-13 variant. When an AI service can't handle inference load and returns degraded quality, that crosses into FM-17. When it simply fails requests, it stays FM-13.
Crypto operators: queue exhaustion during high transaction volume periods (major market moves, new chain launches) appears repeatedly in Kraken, Coinbase, and Luno incident histories

Why FM-13 Is the Highest-Automation Pattern

Detection: resource utilization chart hits ceiling (CPU 100%, connections 0, disk 100%, OOM events)
Diagnosis: demand saturation or leak? (monotonic growth vs. traffic-correlated spike)
Remediation: demand \u2192 scale out. Leak \u2192 restart + fix.

All three steps are automatable from standard telemetry. No business-logic judgment required for the initial response. This is why FM-13 is Tier C \u2014 AI-Closed \u2014 in the autonomy framework.

Prevention: Alert Early

Alert at 75%, not 95%: gives time to respond proactively before the incident
Track P99 resource consumption, not just mean: peak consumption is what causes exhaustion
Separate demand saturation and leak alerting: monotonic growth over 24 hours is a different signal from a traffic-correlated spike

Key Takeaways

11% of 2025 incidents \u2014 high frequency, high automation potential
Two sub-patterns require different responses: confusing demand saturation with leaks makes incidents worse
AI inference is an emerging FM-13 vector: GPU capacity exhaustion growing as AI features embed in products
FM-13 is Tier C (AI-Closed) \u2014 the most automatable pattern in the taxonomy

stackgen.com/state-of-reliability | LinkedIn webinar

Add as preferred source on Google

About StackGen:

StackGen is the pioneer in Autonomous Infrastructure Platform (AIP) technology, helping enterprises transition from manual Infrastructure-as-Code (IaC) management to fully autonomous operations. Founded by infrastructure automation experts and headquartered in the San Francisco Bay Area, StackGen serves leading companies across technology, financial services, manufacturing, and entertainment industries.

Know more

Platform Overview

MCP Server

Integrations Overview

Aiden for SRE

Aiden for Infrastructure

Aiden for Observability

Agentic Developer Experience

Brownfield Applications

Greenfield Applications

Managed OSS Observability

Developers

DevOps

Engineering Leaders

Platform Engineers

SRE

About

Newsroom

Contact Us

Careers

Analysts

Blog

Videos & Webinars

Whitepapers & E-Books

Events

Stacked Up

Documentation

Case Studies