What Happens During a CloudFormation Stack Update - Behind Change Sets, Rollbacks, and Replacements
Learn about the internal processing flow when CloudFormation updates a stack, including how change sets detect differences, the logic behind resource update, replacement, and deletion decisions, and how rollbacks work.
Template Difference Detection - How Changes Are Identified
When you instruct CloudFormation to update a stack, the first step is detecting differences between the current template and the new template. CloudFormation uses each resource's logical ID (the name in the template) as the key to detect property changes. An important point here is that CloudFormation compares "properties written in the template," not "the actual current state of the resource." This means that properties changed manually via the AWS console or CLI will not be detected as differences unless they are reflected in the CloudFormation template. This "drift" (divergence between the template and actual state) is one of the most troublesome issues in CloudFormation operations. While the drift detection feature can identify divergence, it does not auto-remediate. A change set is a feature that previews which resources will change and how before actually executing the update. When you create a change set, CloudFormation analyzes the differences and displays an action of "Add," "Modify," or "Remove" for each resource.
Update, Replacement, and Deletion - Different Update Behaviors per Resource
CloudFormation resource updates follow three patterns. "Update with No Interruption" changes properties without stopping the resource. For example, changing a Lambda function's environment variables or adding tags to an S3 bucket follows this pattern. "Update with Some Interruption" causes the resource to become temporarily unavailable. Changing an EC2 instance's instance type involves stopping and restarting the instance. "Replacement" deletes the existing resource and creates a new one. Changing an RDS instance's engine version or a DynamoDB table's partition key triggers replacement. Replacement is the most dangerous pattern. When resources that hold data (RDS, DynamoDB) are replaced, data may be lost. The CloudFormation documentation specifies the update behavior (no interruption, some interruption, or replacement) for each property of each resource type. This is essential information to review before any stack update.
Dependency Resolution and Update Order
CloudFormation automatically analyzes dependencies between resources in the template and executes updates in the correct order. Dependencies are inferred from Ref functions, Fn::GetAtt functions, and DependsOn attributes. For example, if a Lambda function references an IAM role, the IAM role is updated first, followed by the Lambda function. CloudFormation executes updates in parallel wherever possible. Resources without dependencies are updated simultaneously, reducing update time. However, parallel updates carry risks. If a security group and an EC2 instance are updated simultaneously, the security group rules may temporarily be in an inconsistent state. Circular dependencies (Resource A references Resource B, and Resource B references Resource A) are detected as errors during CloudFormation's template validation stage. To resolve circular dependencies, you need to change one of the references to a DependsOn or redesign the resources.
How Rollbacks Work - How Far Back Does It Go on Failure
If resource creation or modification fails during a stack update, CloudFormation automatically initiates a rollback. A rollback is the process of reverting all resources to their pre-update state. However, rollbacks are not perfect. New resources created through "replacement" are deleted, but old resources deleted through "replacement" cannot be restored. If you set DeletionPolicy to Retain, old resources are preserved during replacement, allowing manual recovery after rollback. Rollbacks themselves can also fail. For example, if resources were manually modified during the update, or if the rollback target state violates current service limits. When a rollback fails, the stack enters the UPDATE_ROLLBACK_FAILED state, requiring manual intervention. You can use the ContinueUpdateRollback API to skip problematic resources and continue the rollback, but skipped resources fall outside CloudFormation's management. This state is operationally very difficult, and recreating the stack is sometimes the best solution.
Practical Measures for Safe Stack Updates
There are four measures to prevent accidents during CloudFormation stack updates. First, always review change sets. Instead of direct updates (UpdateStack), create a change set, preview it, and confirm there are no unintended replacements or deletions before executing. Second, configure stack policies. Stack policies prohibit or restrict updates to specific resources. Setting stack policies on production RDS instances and DynamoDB tables to prohibit replacement and deletion prevents data loss from template mistakes. Third, set DeletionPolicy. Setting DeletionPolicy: Retain on critical resources preserves them during stack deletion or resource replacement. Fourth, enable termination protection. To prevent accidental deletion of the stack itself, enable termination protection on production stacks. Combining these measures significantly reduces the risk of CloudFormation update accidents. For a systematic study of IaC design and operations, specialized books on Amazon are a helpful reference.