AWS CloudFormation is a great tool to implement infrastructure management but it can present problems relating to speed, complexity and state. It is important to understand when to use CloudFormation and when to use alternative solutions to prevent issues in production.
It can be easy to pick up and start using AWS CloudFormation, especially since AWS SAM, AWS CDK and many other tools use it as their foundation. However, CloudFormation has trouble managing stateful resources, global resources and dependencies between stacks. We can manage these limitations by knowing when we should and shouldn't use it.
1. Stateful Resource Management
When using AWS CloudFormation to update stateful resources such as DynamoDB, S3, RDS and R53 it is possible to modify attributes within a CloudFormation stack that require replacement. This can lead to the resource being deleted and a new one created, which can cause huge problems if done in production.
Accidental deletion can be avoided using the retain deletion policy, as shown here . However, this would then require management of the resource outside CloudFormation or importing the resource back into the stack.
It is also possible to hit edge cases in AWS CloudFormation that can break the stack due to the complexity of managing infrastructure. When this happens the recommended solution from AWS is usually to delete the stack and create it again, however doing this in production can be complex and risky.
The risk of accidently deleting databases and the complexity of importing/exporting state is enough for me to recommend using APIs directly over CloudFormation for any stateful resource.
2. Region Bound Stacks
Firstly, AWS CloudFormation templates are always deployed within a region. This can become confusing to manage for global resources as it isn’t always apparent where to look for the stack.
Secondly, cross region connectivity can become complicated when using CloudFormation. The sequencing of changes managed by CloudFormation only works within a stack. When resources in multiple regions (such as security groups and peering connections) need to connect extra work is required in addition to CloudFormation. This can be resolved with code to manage the CloudFormation Stacks, but in my experience it is easier to manage these resources directly with the APIs.
3. Nested Stacks
AWS CloudFormation can use nested stacks to allow for re-use of templates within a stack but it causes more problems than it solves.
AWS CloudFormation is generally slow to deploy resources and this gets much worse when using nested stacks. The more nesting is used the slower deployments get due to the overheads of CloudFormation.
Another issue that arises when using nested stacks is failure recovery. As stacks get bigger and more complicated the risk of failures increases, and when AWS CloudFormation fails, it fails hard. This usually requires a complete teardown and rebuild, which can lead to manual work with outages.
4. Exported Output Values
AWS CloudFormation stacks can output values from resources for other stacks to consume. This seems like a helpful tool, however using any of these values will prevent you from modifying the value or deleting the stack. Depending on the value this could be helpful, but generally it leads to drastically increased complexity in managing stack dependencies.
5. Slow Deployments
AWS CloudFormation is known for being quite a slow solution for managing infrastructure. This is especially true for environments requiring multiple stacks sequentially deployed or for anything that includes IAM resources.
6. Managing Stacks Requires Code
Most environments are complex enough to require multiple CloudFormation stacks to either mitigate risk, re-use templates, or mitigate stack limits. This usually means that code is needed to automate the process of deploying the multiple stacks. One of the main benefits of using CloudFormation is that it is simple, however once we start managing it via code a lot of that benefit is lost.
7. Secret Management
AWS CloudFormation cannot manage certain resources such as EC2 SSH keys due to the risk of exposing the secret values within the resource state. It is also risky to include any secrets within a stack such as in AWS Lambda environment variables, EC2 instance userdata, etc. This is due to the CloudFormation API storing all parameters and the template which is accessible in the AWS console and CloudFormation API.
Summary
AWS CloudFormation quickly enables the management of infrastructure as code but we should primarily use it for single stacks of stateless components and use other tooling for managing the rest of our AWS environment. This includes upstream and downstream dependencies as well as databases, DNS records, and cross region or account resources.
Follow me here for more content or contact me on:
- Twitter: @BenTorvo
- Email: ben@torvo.com.au
- Website: torvo.com.au