Config-Induced Failures: The Incident That Starts With "Nothing Changed"

Author:

| Jun 20, 2026

\u201cWe didn't deploy anything.\u201d It's one of the most common things an on-call engineer says at the start of an incident \u2014 and one of the most misleading. Because while no code deployed, something almost certainly changed.

Config-induced failure (FM-10) is 9% of classified unplanned incidents in 2025. It's structurally similar to deploy-induced regression (FM-09) but harder to detect \u2014 the change management trail is weaker, and modern IaC tooling has dramatically larger blast radius when it goes wrong.

Full data: stackgen.com/state-of-reliability.

What Is Config-Induced Failure?

Any incident triggered by a non-code change: a configuration value, feature flag, environment variable, IAM policy, ACL, network rule, DNS record, quota adjustment, or capacity policy update. Distinct from FM-09 in one key way: the change is often not in the same audit trail as code deploys.

The Two High-Profile Cases

Cloudflare \u2014 November 18, 2025

A ClickHouse permissions update caused a Bot Management feature file to double in size. When the main Cloudflare proxy received the updated config, it crashed \u2014 affecting 56 downstream companies in the SSOR dataset. The change was innocuous-seeming: a database permissions update with an unexpected downstream effect on file size with an unexpected downstream effect on proxy behavior. Three hops, each fine in isolation.

AWS us-east-1 \u2014 October 20, 2025

A DNS automation race condition was triggered by an automated config write \u2014 not a human clicking in the console. An infrastructure automation process writing a config value created a race condition that produced an empty DNS record. This is the IaC-era shape of FM-10: lower frequency than manual config changes, but dramatically higher blast radius.

The IaC Paradox

Before IaC: Config changes were frequent, manual, often undocumented. High frequency, low blast radius, weak audit trail.

With IaC: Config changes are less frequent, version-controlled, reviewable. Lower frequency, much higher blast radius \u2014 one Terraform apply can atomically modify dozens of security groups, IAM policies, DNS records, and routing rules.

The answer isn't less IaC. It's more rigorous change review and blast-radius-aware deployment strategy for IaC changes.

Why \u201cNothing Changed\u201d Is Almost Never True

Automated config writes: infrastructure automation, self-healing systems write config values constantly
Third-party vendor changes: your vendor updated their API behavior, changed a default, or deprecated an endpoint
Certificate expirations: a time-bound config validity that expires
Quota / limit adjustments: cloud provider changes that don't show up in your deployment tooling

Key Takeaways

9% of 2025 incidents \u2014 systematically harder to detect than deploy regressions because the change trail is fragmented
Cloudflare Nov 2025 and AWS Oct 2025 are the clearest high-impact FM-10 examples: both trace to config changes with unexpected cascading consequences
IaC expands blast radius: the same rigorous rollout discipline you apply to code deploys should apply to IaC changes
The highest-leverage investment: change-data integration \u2014 surfacing all config change signals in the same telemetry stream as your alerts and metrics

stackgen.com/state-of-reliability | LinkedIn webinar

Add as preferred source on Google

About StackGen:

StackGen is the pioneer in Autonomous Infrastructure Platform (AIP) technology, helping enterprises transition from manual Infrastructure-as-Code (IaC) management to fully autonomous operations. Founded by infrastructure automation experts and headquartered in the San Francisco Bay Area, StackGen serves leading companies across technology, financial services, manufacturing, and entertainment industries.

Know more

Platform Overview

MCP Server

Integrations Overview

Aiden for SRE

Aiden for Infrastructure

Aiden for Observability

Agentic Developer Experience

Brownfield Applications

Greenfield Applications

Managed OSS Observability

Developers

DevOps

Engineering Leaders

Platform Engineers

SRE

About

Newsroom

Contact Us

Careers

Analysts

Blog

Videos & Webinars

Whitepapers & E-Books

Events

Stacked Up

Documentation

Case Studies

Config-Induced Failures: The Incident That Starts With "Nothing Changed"

What Is Config-Induced Failure?

The Two High-Profile Cases

Cloudflare \u2014 November 18, 2025

AWS us-east-1 \u2014 October 20, 2025

The IaC Paradox

Why \u201cNothing Changed\u201d Is Almost Never True

Key Takeaways

About StackGen:

AGENTS

Solutions

COMPANY

RESOURCES

Platform Overview

MCP Server

Integrations Overview

Aiden for SRE

Aiden for Infrastructure

Aiden for Observability

How Online Services Actually Break: A Data-Backed Failure Mode Taxonomy

Agentic Developer Experience

Brownfield Applications

Greenfield Applications

Managed OSS Observability

Developers

DevOps

Engineering Leaders

Platform Engineers

SRE

How Online Services Actually Break: A Data-Backed Failure Mode Taxonomy

About

Newsroom

Contact Us

Careers

Analysts

How Online Services Actually Break: A Data-Backed Failure Mode Taxonomy

Blog

Videos & Webinars

Whitepapers & E-Books

Events

Stacked Up

Documentation

Case Studies

How Online Services Actually Break: A Data-Backed Failure Mode Taxonomy

Stackgen 2025 Year-End Letter: The Year We Started Building the Future of Infrastructure

How Online Services Actually Break: A Data-Backed Failure Mode Taxonomy

Stackgen 2025 Year-End Letter: The Year We Started Building the Future of Infrastructure

Stackgen 2025 Year-End Letter: The Year We Started Building the Future of Infrastructure

Config-Induced Failures: The Incident That Starts With "Nothing Changed"

What Is Config-Induced Failure?

The Two High-Profile Cases

Cloudflare \u2014 November 18, 2025

AWS us-east-1 \u2014 October 20, 2025

The IaC Paradox

Why \u201cNothing Changed\u201d Is Almost Never True

Key Takeaways

About StackGen:

Download Brochure