Master AWS CloudFormation and CDK for building reusable, maintainable infrastructure with advanced patterns and best practices.
CloudFormation templates start simple but quickly become unwieldy. After managing thousands of stacks across multiple AWS accounts, I've learned that success with Infrastructure as Code on AWS requires mastering both CloudFormation and CDK. Here's how to build infrastructure that scales with your organization.
CloudFormation Advanced Patterns
Nested Stack Architecture
# master-stack.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: Master stack for multi-tier application
Parameters:
Environment:
Type: String
AllowedValues: [dev, staging, prod]
Default: dev
KeyPair:
Type: AWS::EC2::KeyPair::KeyName
Description: EC2 Key Pair for SSH access
Mappings:
EnvironmentConfig:
dev:
VpcCidr: 10.0.0.0/16
InstanceType: t3.micro
staging:
VpcCidr: 10.1.0.0/16
InstanceType: t3.small
prod:
VpcCidr: 10.2.0.0/16
InstanceType: t3.medium
Resources:
NetworkStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: !Sub 'https://s3.amazonaws.com/${TemplateBucket}/network.yaml'
Parameters:
VpcCidr: !FindInMap [EnvironmentConfig, !Ref Environment, VpcCidr]
Environment: !Ref Environment
Tags:
- Key: Environment
Value: !Ref Environment
SecurityStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: !Sub 'https://s3.amazonaws.com/${TemplateBucket}/security.yaml'
Parameters:
VpcId: !GetAtt NetworkStack.Outputs.VpcId
Environment: !Ref Environment
ComputeStack:
Type: AWS::CloudFormation::Stack
DependsOn: SecurityStack
Properties:
TemplateURL: !Sub 'https://s3.amazonaws.com/${TemplateBucket}/compute.yaml'
Parameters:
VpcId: !GetAtt NetworkStack.Outputs.VpcId
SubnetIds: !GetAtt NetworkStack.Outputs.PrivateSubnetIds
SecurityGroupId: !GetAtt SecurityStack.Outputs.AppSecurityGroupId
InstanceType: !FindInMap [EnvironmentConfig, !Ref Environment, InstanceType]
KeyPair: !Ref KeyPair
Outputs:
LoadBalancerDNS:
Value: !GetAtt ComputeStack.Outputs.LoadBalancerDNS
Export:
Name: !Sub '${AWS::StackName}-LoadBalancerDNS'
Custom Resources with Lambda
# custom_resource.py
import json
import boto3
import cfnresponse
def lambda_handler(event, context):
"""Custom resource handler for CloudFormation"""
try:
request_type = event['RequestType']
resource_properties = event['ResourceProperties']
if request_type == 'Create':
response_data = create_resource(resource_properties)
physical_resource_id = response_data['ResourceId']
elif request_type == 'Update':
physical_resource_id = event['PhysicalResourceId']
response_data = update_resource(physical_resource_id, resource_properties)
elif request_type == 'Delete':
physical_resource_id = event['PhysicalResourceId']
delete_resource(physical_resource_id)
response_data = {}
cfnresponse.send(event, context, cfnresponse.SUCCESS, response_data, physical_resource_id)
except Exception as e:
print(f"Error: {str(e)}")
cfnresponse.send(event, context, cfnresponse.FAILED, {}, "")
def create_resource(properties):
"""Create custom resource"""
# Example: Create a custom DNS record
route53 = boto3.client('route53')
response = route53.change_resource_record_sets(
HostedZoneId=properties['HostedZoneId'],
ChangeBatch={
'Changes': [{
'Action': 'CREATE',
'ResourceRecordSet': {
'Name': properties['RecordName'],
'Type': 'CNAME',
'TTL': 300,
'ResourceRecords': [{'Value': properties['RecordValue']}]
}
}]
}
)
return {'ResourceId': response['ChangeInfo']['Id']}
AWS CDK Patterns
CDK Application Structure
// lib/application-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as rds from 'aws-cdk-lib/aws-rds';
import * as elasticache from 'aws-cdk-lib/aws-elasticache';
import { Construct } from 'constructs';
export interface ApplicationStackProps extends cdk.StackProps {
environment: 'dev' | 'staging' | 'prod';
domainName: string;
}
export class ApplicationStack extends cdk.Stack {
public readonly vpc: ec2.Vpc;
public readonly cluster: ecs.Cluster;
constructor(scope: Construct, id: string, props: ApplicationStackProps) {
super(scope, id, props);
// VPC with custom configuration
this.vpc = new ec2.Vpc(this, 'ApplicationVpc', {
maxAzs: 3,
natGateways: props.environment === 'prod' ? 3 : 1,
subnetConfiguration: [
{
name: 'Public',
subnetType: ec2.SubnetType.PUBLIC,
cidrMask: 24,
},
{
name: 'Private',
subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
cidrMask: 24,
},
{
name: 'Isolated',
subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
cidrMask: 24,
},
],
});
// ECS Cluster
this.cluster = new ecs.Cluster(this, 'Cluster', {
vpc: this.vpc,
containerInsights: props.environment === 'prod',
capacity: {
instanceType: this.getInstanceType(props.environment),
minCapacity: props.environment === 'prod' ? 3 : 1,
maxCapacity: props.environment === 'prod' ? 10 : 3,
},
});
// RDS Database
const database = new rds.DatabaseCluster(this, 'Database', {
engine: rds.DatabaseClusterEngine.auroraMysql({
version: rds.AuroraMysqlEngineVersion.VER_3_02_0,
}),
instanceProps: {
vpc: this.vpc,
vpcSubnets: {
subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
},
instanceType: this.getDatabaseInstanceType(props.environment),
},
backup: {
retention: cdk.Duration.days(props.environment === 'prod' ? 30 : 7),
},
removalPolicy: props.environment === 'prod'
? cdk.RemovalPolicy.RETAIN
: cdk.RemovalPolicy.DESTROY,
});
// ElastiCache Redis
const cacheSubnetGroup = new elasticache.CfnSubnetGroup(this, 'CacheSubnetGroup', {
description: 'Subnet group for ElastiCache',
subnetIds: this.vpc.selectSubnets({
subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
}).subnetIds,
});
const cacheCluster = new elasticache.CfnReplicationGroup(this, 'CacheCluster', {
replicationGroupDescription: 'Application cache',
engine: 'redis',
cacheNodeType: this.getCacheNodeType(props.environment),
numCacheClusters: props.environment === 'prod' ? 3 : 1,
automaticFailoverEnabled: props.environment === 'prod',
cacheSubnetGroupName: cacheSubnetGroup.ref,
securityGroupIds: [this.createCacheSecurityGroup().securityGroupId],
});
// Output important values
new cdk.CfnOutput(this, 'VpcId', {
value: this.vpc.vpcId,
exportName: `${this.stackName}-VpcId`,
});
}
private getInstanceType(environment: string): ec2.InstanceType {
const instanceTypes = {
dev: 't3.micro',
staging: 't3.small',
prod: 't3.medium',
};
return new ec2.InstanceType(instanceTypes[environment]);
}
private getDatabaseInstanceType(environment: string): ec2.InstanceType {
const instanceTypes = {
dev: 't3.small',
staging: 't3.medium',
prod: 'r5.large',
};
return new ec2.InstanceType(instanceTypes[environment]);
}
private getCacheNodeType(environment: string): string {
const nodeTypes = {
dev: 'cache.t3.micro',
staging: 'cache.t3.small',
prod: 'cache.r5.large',
};
return nodeTypes[environment];
}
private createCacheSecurityGroup(): ec2.SecurityGroup {
return new ec2.SecurityGroup(this, 'CacheSecurityGroup', {
vpc: this.vpc,
description: 'Security group for ElastiCache',
allowAllOutbound: false,
});
}
}
CDK Custom Constructs
// lib/constructs/auto-scaling-service.ts
import * as cdk from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
import * as autoscaling from 'aws-cdk-lib/aws-applicationautoscaling';
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
import { Construct } from 'constructs';
export interface AutoScalingServiceProps {
cluster: ecs.Cluster;
image: ecs.ContainerImage;
desiredCount?: number;
cpu?: number;
memory?: number;
domainName: string;
certificate: elbv2.Certificate;
}
export class AutoScalingService extends Construct {
public readonly service: ecs.FargateService;
public readonly loadBalancer: elbv2.ApplicationLoadBalancer;
constructor(scope: Construct, id: string, props: AutoScalingServiceProps) {
super(scope, id);
// Task Definition
const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef', {
cpu: props.cpu || 256,
memoryLimitMiB: props.memory || 512,
});
const container = taskDefinition.addContainer('app', {
image: props.image,
logging: ecs.LogDrivers.awsLogs({
streamPrefix: 'app',
}),
environment: {
NODE_ENV: 'production',
},
healthCheck: {
command: ['CMD-SHELL', 'curl -f http://localhost:3000/health || exit 1'],
interval: cdk.Duration.seconds(30),
timeout: cdk.Duration.seconds(5),
retries: 3,
},
});
container.addPortMappings({
containerPort: 3000,
protocol: ecs.Protocol.TCP,
});
// Service
this.service = new ecs.FargateService(this, 'Service', {
cluster: props.cluster,
taskDefinition,
desiredCount: props.desiredCount || 2,
assignPublicIp: false,
circuitBreaker: {
rollback: true,
},
});
// Load Balancer
this.loadBalancer = new elbv2.ApplicationLoadBalancer(this, 'LoadBalancer', {
vpc: props.cluster.vpc,
internetFacing: true,
});
const targetGroup = new elbv2.ApplicationTargetGroup(this, 'TargetGroup', {
vpc: props.cluster.vpc,
port: 3000,
protocol: elbv2.ApplicationProtocol.HTTP,
targets: [this.service],
healthCheck: {
path: '/health',
interval: cdk.Duration.seconds(30),
timeout: cdk.Duration.seconds(5),
healthyThresholdCount: 2,
unhealthyThresholdCount: 3,
},
deregistrationDelay: cdk.Duration.seconds(30),
});
// HTTPS Listener
this.loadBalancer.addListener('HttpsListener', {
port: 443,
protocol: elbv2.ApplicationProtocol.HTTPS,
certificates: [props.certificate],
defaultTargetGroups: [targetGroup],
});
// HTTP to HTTPS redirect
this.loadBalancer.addListener('HttpListener', {
port: 80,
protocol: elbv2.ApplicationProtocol.HTTP,
defaultAction: elbv2.ListenerAction.redirect({
protocol: 'HTTPS',
port: '443',
permanent: true,
}),
});
// Auto Scaling
const scaling = this.service.autoScaleTaskCount({
minCapacity: 2,
maxCapacity: 10,
});
scaling.scaleOnCpuUtilization('CpuScaling', {
targetUtilizationPercent: 70,
scaleInCooldown: cdk.Duration.seconds(60),
scaleOutCooldown: cdk.Duration.seconds(60),
});
scaling.scaleOnMemoryUtilization('MemoryScaling', {
targetUtilizationPercent: 80,
scaleInCooldown: cdk.Duration.seconds(60),
scaleOutCooldown: cdk.Duration.seconds(60),
});
// Custom metric scaling
scaling.scaleOnMetric('RequestCountScaling', {
metric: new cloudwatch.Metric({
namespace: 'AWS/ApplicationELB',
metricName: 'RequestCountPerTarget',
dimensionsMap: {
TargetGroup: targetGroup.targetGroupFullName,
},
}),
scalingSteps: [
{ upper: 100, change: -1 },
{ lower: 200, change: +1 },
{ lower: 500, change: +3 },
],
adjustmentType: autoscaling.AdjustmentType.CHANGE_IN_CAPACITY,
});
}
}
Stack Sets for Multi-Account Deployment
StackSet Template
# stackset-template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: StackSet for multi-account security baseline
Parameters:
SecurityAccountId:
Type: String
Description: Central security account ID
LoggingBucketName:
Type: String
Description: Central logging bucket name
Resources:
CloudTrail:
Type: AWS::CloudTrail::Trail
Properties:
TrailName: !Sub '${AWS::AccountId}-security-trail'
S3BucketName: !Ref LoggingBucketName
IncludeGlobalServiceEvents: true
IsLogging: true
IsMultiRegionTrail: true
EnableLogFileValidation: true
EventSelectors:
- IncludeManagementEvents: true
ReadWriteType: All
DataResources:
- Type: AWS::S3::Object
Values:
- 'arn:aws:s3:::*/AWSLogs/*'
ConfigRecorder:
Type: AWS::Config::ConfigurationRecorder
Properties:
Name: default
RoleArn: !GetAtt ConfigRole.Arn
RecordingGroup:
AllSupported: true
IncludeGlobalResourceTypes: true
ConfigDeliveryChannel:
Type: AWS::Config::DeliveryChannel
Properties:
Name: default
S3BucketName: !Ref LoggingBucketName
ConfigSnapshotDeliveryProperties:
DeliveryFrequency: TwentyFour_Hours
GuardDutyDetector:
Type: AWS::GuardDuty::Detector
Properties:
Enable: true
FindingPublishingFrequency: FIFTEEN_MINUTES
SecurityHub:
Type: AWS::SecurityHub::Hub
Properties:
Tags:
Environment: Production
StackSet Deployment Script
# deploy_stackset.py
import boto3
import time
class StackSetDeployer:
def __init__(self):
self.cfn = boto3.client('cloudformation')
self.org = boto3.client('organizations')
def deploy_stackset(self, template_path, stackset_name, ou_ids):
"""Deploy StackSet to organizational units"""
with open(template_path, 'r') as f:
template_body = f.read()
# Create StackSet
self.cfn.create_stack_set(
StackSetName=stackset_name,
TemplateBody=template_body,
Capabilities=['CAPABILITY_IAM', 'CAPABILITY_NAMED_IAM'],
PermissionModel='SERVICE_MANAGED',
AutoDeployment={
'Enabled': True,
'RetainStacksOnAccountRemoval': False
},
CallAs='DELEGATED_ADMIN'
)
# Deploy to OUs
operation_id = self.cfn.create_stack_instances(
StackSetName=stackset_name,
DeploymentTargets={
'OrganizationalUnitIds': ou_ids
},
Regions=['us-east-1', 'eu-west-1'],
OperationPreferences={
'RegionConcurrencyType': 'PARALLEL',
'MaxConcurrentPercentage': 100,
'FailureTolerancePercentage': 10
}
)
# Monitor deployment
while True:
status = self.cfn.describe_stack_set_operation(
StackSetName=stackset_name,
OperationId=operation_id['OperationId']
)
if status['StackSetOperation']['Status'] in ['SUCCEEDED', 'FAILED', 'STOPPED']:
break
time.sleep(30)
return status
Change Sets and Safe Deployments
Change Set Validation
# changeset_validator.py
import boto3
import json
class ChangeSetValidator:
def __init__(self):
self.cfn = boto3.client('cloudformation')
def create_and_validate_changeset(self, stack_name, template_path, parameters):
"""Create and validate change set before deployment"""
with open(template_path, 'r') as f:
template_body = f.read()
# Create change set
changeset_name = f"{stack_name}-changeset-{int(time.time())}"
response = self.cfn.create_change_set(
StackName=stack_name,
ChangeSetName=changeset_name,
TemplateBody=template_body,
Parameters=parameters,
Capabilities=['CAPABILITY_IAM', 'CAPABILITY_NAMED_IAM'],
ChangeSetType='UPDATE' if self.stack_exists(stack_name) else 'CREATE'
)
# Wait for creation
waiter = self.cfn.get_waiter('change_set_create_complete')
waiter.wait(
StackName=stack_name,
ChangeSetName=changeset_name
)
# Analyze changes
changes = self.cfn.describe_change_set(
StackName=stack_name,
ChangeSetName=changeset_name
)
# Validate changes
validation_results = self.validate_changes(changes['Changes'])
if validation_results['safe_to_deploy']:
# Execute change set
self.cfn.execute_change_set(
StackName=stack_name,
ChangeSetName=changeset_name
)
else:
# Delete change set
self.cfn.delete_change_set(
StackName=stack_name,
ChangeSetName=changeset_name
)
return validation_results
def validate_changes(self, changes):
"""Validate changes for safety"""
dangerous_changes = []
warnings = []
for change in changes:
resource_change = change['ResourceChange']
# Check for deletions
if resource_change['Action'] == 'Remove':
dangerous_changes.append({
'resource': resource_change['LogicalResourceId'],
'type': resource_change['ResourceType'],
'action': 'DELETE'
})
# Check for replacements
elif resource_change['Action'] == 'Modify':
if resource_change.get('Replacement') == 'True':
dangerous_changes.append({
'resource': resource_change['LogicalResourceId'],
'type': resource_change['ResourceType'],
'action': 'REPLACE'
})
# Check for specific resource types
if resource_change['ResourceType'] in ['AWS::RDS::DBInstance', 'AWS::RDS::DBCluster']:
warnings.append(f"Database change detected: {resource_change['LogicalResourceId']}")
return {
'safe_to_deploy': len(dangerous_changes) == 0,
'dangerous_changes': dangerous_changes,
'warnings': warnings
}
Drift Detection and Remediation
Drift Detection Automation
# drift_detection.py
import boto3
from datetime import datetime
class DriftDetector:
def __init__(self):
self.cfn = boto3.client('cloudformation')
self.sns = boto3.client('sns')
def detect_drift_all_stacks(self):
"""Detect drift across all stacks"""
# List all stacks
stacks = self.cfn.list_stacks(
StackStatusFilter=['CREATE_COMPLETE', 'UPDATE_COMPLETE']
)
drift_results = []
for stack in stacks['StackSummaries']:
stack_name = stack['StackName']
# Initiate drift detection
detection_id = self.cfn.detect_stack_drift(
StackName=stack_name
)['StackDriftDetectionId']
# Wait for detection to complete
while True:
status = self.cfn.describe_stack_drift_detection_status(
StackDriftDetectionId=detection_id
)
if status['DetectionStatus'] in ['DETECTION_COMPLETE', 'DETECTION_FAILED']:
break
time.sleep(5)
# Get drift results
if status['DetectionStatus'] == 'DETECTION_COMPLETE':
if status['StackDriftStatus'] != 'IN_SYNC':
drift_results.append({
'stack_name': stack_name,
'drift_status': status['StackDriftStatus'],
'drifted_resources': status.get('DriftedStackResourceCount', 0)
})
# Send notifications
if drift_results:
self.send_drift_notification(drift_results)
return drift_results
def remediate_drift(self, stack_name):
"""Remediate drift by updating stack with current template"""
# Get current template
template = self.cfn.get_template(
StackName=stack_name,
TemplateStage='Processed'
)
# Get current parameters
stack_info = self.cfn.describe_stacks(StackName=stack_name)
parameters = stack_info['Stacks'][0]['Parameters']
# Update stack to remediate drift
self.cfn.update_stack(
StackName=stack_name,
TemplateBody=json.dumps(template['TemplateBody']),
Parameters=parameters,
Capabilities=['CAPABILITY_IAM', 'CAPABILITY_NAMED_IAM']
)
CDK Testing
Unit Tests for CDK
// test/application-stack.test.ts
import * as cdk from 'aws-cdk-lib';
import { Template, Match } from 'aws-cdk-lib/assertions';
import { ApplicationStack } from '../lib/application-stack';
describe('ApplicationStack', () => {
test('Creates VPC with correct configuration', () => {
const app = new cdk.App();
const stack = new ApplicationStack(app, 'TestStack', {
environment: 'prod',
domainName: 'example.com',
});
const template = Template.fromStack(stack);
// Check VPC exists with 3 AZs
template.hasResourceProperties('AWS::EC2::VPC', {
EnableDnsHostnames: true,
EnableDnsSupport: true,
});
// Check for 3 NAT gateways in production
template.resourceCountIs('AWS::EC2::NatGateway', 3);
// Check subnets
template.resourceCountIs('AWS::EC2::Subnet', 9); // 3 AZs * 3 subnet types
});
test('Database has correct backup retention', () => {
const app = new cdk.App();
const stack = new ApplicationStack(app, 'TestStack', {
environment: 'prod',
domainName: 'example.com',
});
const template = Template.fromStack(stack);
template.hasResourceProperties('AWS::RDS::DBCluster', {
BackupRetentionPeriod: 30,
});
});
test('Security groups are properly configured', () => {
const app = new cdk.App();
const stack = new ApplicationStack(app, 'TestStack', {
environment: 'prod',
domainName: 'example.com',
});
const template = Template.fromStack(stack);
// Check that security groups exist
template.hasResourceProperties('AWS::EC2::SecurityGroup', {
GroupDescription: Match.anyValue(),
VpcId: Match.anyValue(),
});
});
});
Best Practices
Template Organization
# template-structure.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Best practice template structure'
# Metadata for documentation and UI
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: Network Configuration
Parameters:
- VpcId
- SubnetIds
- Label:
default: Security Configuration
Parameters:
- KeyPair
- AllowedCidr
ParameterLabels:
VpcId:
default: VPC ID
# Parameters with constraints
Parameters:
Environment:
Type: String
AllowedValues: [dev, staging, prod]
ConstraintDescription: Must be dev, staging, or prod
# Conditions for environment-specific resources
Conditions:
IsProduction: !Equals [!Ref Environment, prod]
CreateBackup: !Or
- !Equals [!Ref Environment, prod]
- !Equals [!Ref Environment, staging]
# Mappings for regional/environment configs
Mappings:
RegionMap:
us-east-1:
AMI: ami-12345678
eu-west-1:
AMI: ami-87654321
# Transform for macros and includes
Transform:
- AWS::Serverless-2016-10-31
- AWS::Include
# Resources with proper dependencies
Resources:
MyResource:
Type: AWS::EC2::Instance
Condition: IsProduction
DependsOn: MyOtherResource
Properties:
ImageId: !FindInMap [RegionMap, !Ref 'AWS::Region', AMI]
# Outputs for cross-stack references
Outputs:
ResourceId:
Description: Resource identifier
Value: !Ref MyResource
Export:
Name: !Sub '${AWS::StackName}-ResourceId'
Conclusion
Success with AWS Infrastructure as Code requires mastering both CloudFormation's declarative approach and CDK's programmatic power. Use CloudFormation for simple, stable infrastructure and CDK for complex, dynamic applications. Always validate changes, test thoroughly, and automate drift detection. Your infrastructure code is as critical as your application code—treat it with the same rigor.