Config-Induced Failures: The Incident That Starts With "Nothing Changed"
\u201cWe didn't deploy anything.\u201d It's one of the most common things an on-call engineer says at the start of an incident \u2014 and one of the most misleading. Because while no code deployed, something almost certainly changed.
Config-induced failure (FM-10) is 9% of classified unplanned incidents in 2025. It's structurally similar to deploy-induced regression (FM-09) but harder to detect \u2014 the change management trail is weaker, and modern IaC tooling has dramatically larger blast radius when it goes wrong.
Full data: stackgen.com/state-of-reliability.
What Is Config-Induced Failure?
Any incident triggered by a non-code change: a configuration value, feature flag, environment variable, IAM policy, ACL, network rule, DNS record, quota adjustment, or capacity policy update. Distinct from FM-09 in one key way: the change is often not in the same audit trail as code deploys.
The Two High-Profile Cases
Cloudflare \u2014 November 18, 2025
A ClickHouse permissions update caused a Bot Management feature file to double in size. When the main Cloudflare proxy received the updated config, it crashed \u2014 affecting 56 downstream companies in the SSOR dataset. The change was innocuous-seeming: a database permissions update with an unexpected downstream effect on file size with an unexpected downstream effect on proxy behavior. Three hops, each fine in isolation.
AWS us-east-1 \u2014 October 20, 2025
A DNS automation race condition was triggered by an automated config write \u2014 not a human clicking in the console. An infrastructure automation process writing a config value created a race condition that produced an empty DNS record. This is the IaC-era shape of FM-10: lower frequency than manual config changes, but dramatically higher blast radius.
The IaC Paradox
Before IaC: Config changes were frequent, manual, often undocumented. High frequency, low blast radius, weak audit trail.
With IaC: Config changes are less frequent, version-controlled, reviewable. Lower frequency, much higher blast radius \u2014 one Terraform apply can atomically modify dozens of security groups, IAM policies, DNS records, and routing rules.
The answer isn't less IaC. It's more rigorous change review and blast-radius-aware deployment strategy for IaC changes.
Why \u201cNothing Changed\u201d Is Almost Never True
- Automated config writes: infrastructure automation, self-healing systems write config values constantly
- Third-party vendor changes: your vendor updated their API behavior, changed a default, or deprecated an endpoint
- Certificate expirations: a time-bound config validity that expires
- Quota / limit adjustments: cloud provider changes that don't show up in your deployment tooling
Key Takeaways
- 9% of 2025 incidents \u2014 systematically harder to detect than deploy regressions because the change trail is fragmented
- Cloudflare Nov 2025 and AWS Oct 2025 are the clearest high-impact FM-10 examples: both trace to config changes with unexpected cascading consequences
- IaC expands blast radius: the same rigorous rollout discipline you apply to code deploys should apply to IaC changes
- The highest-leverage investment: change-data integration \u2014 surfacing all config change signals in the same telemetry stream as your alerts and metrics
About StackGen:
StackGen is the pioneer in Autonomous Infrastructure Platform (AIP) technology, helping enterprises transition from manual Infrastructure-as-Code (IaC) management to fully autonomous operations. Founded by infrastructure automation experts and headquartered in the San Francisco Bay Area, StackGen serves leading companies across technology, financial services, manufacturing, and entertainment industries.