How to protect your AWS resources from deletion and replacement with Cloudformation

Introduction

Infrastructure as Code is amazing, it makes your infra more declarative and readable and empower DevOps team to use the same benefits of version control tools like Git and Github for infrastructure. However, small mistakes can cost you to delete the wrong instances or databases. In this article we explore the Deletion and Replacement concepts within AWS in depth and practice the features Cloudformation provides to secure your organization against these problems.

Deletion and Replacement

These are two different concepts which both require measures to protect against. However, often Deletion is the only issue that is getting enough attention. So let's explore both of these operations with an example.

Here we have a simple Cloudformation template which creates a public EC2 instance with a security group to allow us SSH access. Make sure to follow best practices if you are opening SSH port like this to the world.

# template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  Template to create an EC2 instance
  Used for the blog post "Cloudformation protection measures for infrastructure" by Pooria Atarzadeh (https://opshack.dev)

Parameters:
  KeyName:
    Description: Name of an existing EC2 KeyPair to enable SSH access to the instance
    Type: AWS::EC2::KeyPair::KeyName
    ConstraintDescription: must be the name of an existing EC2 KeyPair.

Resources:
  EC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.nano
      SecurityGroups: [!Ref 'InstanceSecurityGroup']
      KeyName: !Ref 'KeyName'
      # Amazon Linux 2 Kernel 5.10 AMI 2.0.20220912.1 x86_64 HVM gp2
      ImageId: "ami-06672d07f62285d1d"
  InstanceSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Enable SSH access via port 22
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 22
        ToPort: 22
        CidrIp: 0.0.0.0/8

We can deploy this into a stack called ec2-stack with CLI like this:

aws cloudformation deploy --template-file ./template.yaml --stack-name ec2-stack --s3-bucket YOUR_S3_BUCKET --parameter-overrides "KeyName=main"

Before running this command make sure you have a S3 bucket to upload the template and an EC2 Keypair created (and downloaded) called main or any other name.

Deletion

Let's say some messy PR by mistake removes the code related to EC2 from template like this:

Resources:
-  EC2Instance:
-    Type: AWS::EC2::Instance
-     Properties:
-       InstanceType: t2.nano
-       SecurityGroups: [!Ref 'InstanceSecurityGroup']
-       KeyName: !Ref 'KeyName'
-      # Amazon Linux 2 Kernel 5.10 AMI 2.0.20220912.1 x86_64 HVM gp2
-      ImageId: "ami-06672d07f62285d1d"
  InstanceSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Enable SSH access via port 22
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 22
        ToPort: 22
        CidrIp: 0.0.0.0/0

To review the damage this can cause, we can try to get the changeset for this update without executing it. The CLI flag --no-execute-changeset can help us achieve this.

aws cloudformation deploy --template-file ./template.yaml --stack-name ec2-stack --s3-bucket YOUR_S3_BUCKET --parameter-overrides "KeyName=main" --no-execute-changeset

After running this command, Cloudformation creates a changeset and prints out the secondary command to run for viewing the changeset in the standard output.

Uploading to 825a446c68ba7189be997d8a0a340d77.template  4328 / 4328.0  (100.00%)
Waiting for changeset to be created..
Changeset created successfully. Run the following command to review changes:
aws cloudformation describe-change-set --change-set-name arn:aws:cloudformation:eu-west-2:[ACCOUNT_ID]:changeSet/awscli-cloudformation-package-deploy-1665902910/750c150d-cbb6-4bd4-8d78-cbed5196cc63

We can view the Changeset in CLI using the command above but if you have Console access, you can view it over there with an easier to review UI. Visit Cloudformation page inside your AWS Console and find the stack you just created. After clicking on the stack you can find the Changesets tab on the right side.

Screenshot 2022-10-16 at 3.06.24 PM.png

Enter the tab and click on the latest Changeset and you shall find this report:

Screenshot 2022-10-16 at 3.07.08 PM.png

As you can see if our CI/CD pipeline was to deploy every change without manual confirmation, this simple mistake could have costed us losing the EC2 instance and every data stored within it. Now before going into how we can protect ourselves from this threat, let's take a quick look into replacement operation as well.

Replacement

Updating some properties of your infrastructure within Cloudformation can trigger a replacement behavior. This means instead of modifying some properties of the resource, there is a need to create a new resource and delete the previous one. The new resource would have a new physicalID as well. Changing instance types or availability zones are common logical reasons to replace an instance but there are more properties in resources that require a replacement. Fortunately, Cloudformation is sensible about the availability of your infrastructure and first create the new resource and update the reference to the new one before deleting the older resource. So this operation should be harmless and normally shall not create downtime, however you need to study it on a case by case basis.

Let's change the instance's ImageId to Ubuntu (from Amazon Linux 2) and see what happens:

EC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.nano
      SecurityGroups: [!Ref 'InstanceSecurityGroup']
      KeyName: !Ref 'KeyName'
+      # Canonical, Ubuntu, 22.04 LTS, amd64 jammy image build on 2022-09-12
+      ImageId: ami-0f540e9f488cfa27d

After triggering another Changeset we can find this report:

Screenshot 2022-10-16 at 4.02.39 PM.png

As you can see your resource will be replaced in this occasion and you should prepare for the consequences of it. For example in this case all of our data within the instance is lost because we are not using any sort of persistent data storage.

In order to find out if a modification triggers a replacement before creating change set, you can check AWS Resource Types Reference. In our case, we could find out that ImageId modification requires a replacement. This document is often more reliable than change set itself, since sometimes the Replacement column within a change set is set to Conditional which is confusing.

Screenshot 2022-10-16 at 4.43.31 PM.png

Solution

Cloudformation provides DeletionPolicy and UpdateReplacePolicy attributes which you can use on the root level of any resource. The value for these could be Delete or Retain, and for some resources Snapshot given that the resource supports that feature.

Here is how we can protect our EC2 instance from a Deletion change set:

 EC2Instance:
+   DeletionPolicy: Retain
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.nano
      SecurityGroups: [!Ref 'InstanceSecurityGroup']
      KeyName: !Ref 'KeyName'
      # Amazon Linux 2 Kernel 5.10 AMI 2.0.20220912.1 x86_64 HVM gp2
      ImageId: "ami-06672d07f62285d1d"

This change does not fail the next deployment but just removed the resource from this Cloudformation stack. The retained resource will continue to exist without interruption independently. Using this attribute is a best practice for databases and stateful instances.

In the other hand, we can protect our resources from a replacement also similarly with UpdateReplacePolicy attribute:

 EC2Instance:
+   DeletionPolicy: Retain
+   UpdateReplacePolicy: Retain
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.nano
      SecurityGroups: [!Ref 'InstanceSecurityGroup']
      KeyName: !Ref 'KeyName'
      # Amazon Linux 2 Kernel 5.10 AMI 2.0.20220912.1 x86_64 HVM gp2
      ImageId: "ami-06672d07f62285d1d"

Also same values apply to this attribute. For example using the value Snapshot instead of Retain could trigger a snapshot on your EC2 instance before replacement. So you could use the data inside that instance for auditing purposes or data recovery later on.

Here are the list of all the resources which currently support taking an snapshot:

  AWS::EC2::Volume
  AWS::ElastiCache::CacheCluster
  AWS::ElastiCache::ReplicationGroup
  AWS::Neptune::DBCluster
  AWS::RDS::DBCluster
  AWS::RDS::DBInstance
  AWS::Redshift::Cluster

I also like to mention that some resources have their native AWS protection policies which are separated from Cloudformation policies. For example in EC2 instance we can set the DisableApiTermination: true in template (or TerminationPolicy inside AWS Console) which would fail any termination attempts like this:

Screenshot 2022-10-16 at 9.57.36 PM.png

How to delete protected resources?

There will be many cases that you protected a resource and later on, you actually decided to terminate it. In this situation, the solution is always to rollout two consequent updates. First update should remove the policy from the template (or resource) and second update will actually delete the resource.

Seems straightforward huh?

Conclusion

AWS Cloudformation provides two attributes for all of the resources to provide basic protection from accidental termination or replacement. There are other use cases as well for these features such as removing items from one stack to another or taking snapshots for future audits. The replacement operation might imposes interruption, downtime or data loss to your resources which depends on the attribute and the properties of the resource. However, often this operation is done safely by Cloudformation.