Quick Answer
You write Terraform, you apply it, you merge the PR — and everything matches. Until it doesn’t. A week later, an on-call engineer makes a quick fix in the AWS console at 2 AM. An autoscaler quietly tweaks a resource. A third-party security tool modifies a rule. Terraform’s state file has no idea any of this happened.
The result is infrastructure drift: the quiet gap between what your configuration says should exist and what actually runs in production. Left unchecked, drift causes surprise plan outputs, compliance violations, runaway cloud costs, and the kind of 3 AM incident where nobody can explain why Terraform wants to destroy a database that’s serving live traffic.
In this guide you’ll learn how to detect, diagnose, fix, and prevent Terraform state drift — with working commands, real code examples, CI automation patterns, a compliance angle your security team will appreciate, and a troubleshooting section for the edge cases that burn teams the most.
Terraform tracks your infrastructure through a state file (terraform.tfstate) — a JSON document mapping your .tf configuration to real cloud resources. Every plan and apply compares the state file against what’s currently deployed. When they don’t match, Terraform treats that as a desired change — which may mean overwriting something you didn’t intend to change.
Drift is a three-way divergence:
A concrete example: your config sets an Auto Scaling Group with max_size = 5. During a traffic spike, an engineer bumps it to max_size = 10 in the AWS console. The state file still shows 5. Next terraform plan: Terraform wants to set it back to 5 — in production, during peak traffic.
Most drift isn’t malicious — it comes from the gap between how teams intend to manage infrastructure and how they actually do it day-to-day:
Drift isn’t just operationally annoying — it creates compounding risk at the compliance, security, and cost layers that audit teams and CFOs care about:
Understanding which command to use — and when — is the difference between safe inspection and accidentally overwriting production state.
terraform plan queries your cloud provider, compares the result against your state file and config, and reports what would change. Any unexpected diff is a drift signal.
Example drift output (~ = update in-place, - = destroy, -/+ = destroy+recreate):
This is the recommended first step for any drift investigation. It queries cloud provider APIs for actual resource state and shows what the state file would need to update — without proposing any infrastructure changes. It requires explicit confirmation before writing to state.
Surgical inspection of individual tracked resources. Useful for comparing specific attribute values against what’s live in your cloud console:
Reference: Terraform State — HashiCorp Developer
The fix depends on one judgment call: should the real-world change be kept, or reverted?
Use when the out-of-band change was accidental, unauthorized, or temporary.
Use when the real-world change was intentional and correct — for example, a database was correctly resized during an incident and should stay that way.
For resources that exist in the cloud but are completely unknown to Terraform — created manually and never managed by IaC. Two approaches:
Manual detection doesn’t scale. The most reliable approach is scheduling drift checks in CI pipelines so your team is notified before drift compounds into an incident or an audit finding.
Some changes are expected by design (autoscaling-managed counts, provider-managed fields). Use ignore_changes to exclude them so real drift stands out from background noise:
Detection and remediation are reactive. These practices make drift the exception rather than the default:
A terraform apply failed mid-run. Some resources were created/modified, but the state file now reflects a partial picture. Next terraform plan shows confusing changes that don’t match your intent.
Classic symptom of a dependency drift: a resource that another resource depends on (e.g. an IAM role, VPC, or security group) was changed outside Terraform, causing Terraform to plan a recreate cascade.
Some cloud providers update default attribute values (e.g. AWS changes default instance_metadata_service settings). Terraform sees this as drift even though you didn’t change anything. Fix:
Native Terraform commands require manual execution and provide no centralized dashboard across workspaces. These platforms extend drift detection into continuous governance with alerting, policy enforcement, and (optionally) automated remediation:
Terraform state drift occurs when your live cloud infrastructure diverges from what Terraform's state file and configuration expect. It happens when resources are modified outside the Terraform workflow — through cloud consoles, CLI scripts, external automation, or cloud provider auto-updates. Drift is invisible until you run terraform plan.
Run terraform plan -refresh-only. This queries your cloud provider for the current state of all managed resources and shows you what the state file would need to update — without proposing any infrastructure changes or modifying anything. It’s the safest drift inspection command available.
terraform plan both detects drift AND proposes changes to bring infrastructure in line with your config. terraform plan -refresh-only only detects drift (shows what changed in the real infrastructure) without proposing any remediation. Use -refresh-only first to understand the drift, then decide whether to apply normally or accept the real-world state.
Yes. The terraform refresh command was deprecated in Terraform 0.15.4. It overwrites your state file without showing you a diff or asking for confirmation, which makes it dangerous. Use terraform apply -refresh-only instead — it shows the proposed state changes and requires explicit confirmation before writing to disk.
Three safe options: (1) Run terraform apply -refresh-only to update the state file to match the live infrastructure, then update your .tf files to match; (2) Use terraform import or import blocks to bring untracked resources under Terraform management; (3) Use lifecycle.ignore_changes to declare that certain attributes should not be reconciled by Terraform.
Schedule a GitHub Actions (or GitLab CI) workflow to run terraform plan -detailed-exitcode on a nightly cron. Exit code 2 means drift was detected. Pipe this to a Slack notification step. See the full workflow example in Section 6 above.
Native Terraform CLI requires you to run drift detection per workspace. At scale, platforms like HCP Terraform (Standard Edition), Spacelift, or env0 provide continuous drift monitoring across all workspaces with a centralized dashboard and policy enforcement.
Terraform state drift is structural — the state-file model means any change made outside of terraform apply creates a gap. The goal isn’t to eliminate all possibility of drift; it’s to detect it quickly, remediate it safely, and build systems that make drift the exception rather than the default.
Your action checklist:
Infrastructure drift caught early is a minor correction. Drift discovered during the next audit or incident is a crisis. Build the system that catches it first.
🚀 Stop Managing Drift Manually — Try Aiden for Infrastructure
Aiden for Infrastructure detects and surfaces drift remediation suggestions inline with your IaC workflow — no 2 AM scramble required. Visit stackgen.com/aiden-infrastructure to learn more.