This AWS Cloud Development Kit (CDK) project deploys Squid proxy instances to implement a “transparent proxy” that can restrict both HTTP and HTTPS outbound traffic to a given set of Internet domains, while being fully transparent for instances in the private subnet.
This project builds the solution as described in the AWS Security blog: How to add DNS filtering to your NAT instance with Squid and focuses on AWS CDK implementation.
In summary, the CDK project deploys a VPC with 2 public and 2 private subnets across 2 availability zones. Squid proxy instances in the public subnets intercept HTTP/S traffic and then initiate a connection with the destination through the Internet gateway. A test EC2 instance is provisioned in one private subnet
The following diagram describes the solution used to address availability in case a Squid instance fails and traffic must be routed via the other available instance.
A CloudWatch Alarm is used to monitor the status of the squid instance. A change in the alarm status from OK
to ALARM
triggers a Lambda function that marks the instance as unhealthy and updates the route table attached to the private subnets in the affected AZ to redirect outbound traffic to a healthy Squid instance in the other AZ.
The Auto Scaling group replaces the unhealthy instance with a healthy Squid instance. Once the alarm status changes from ALARM
back to OK
, the Lambda function is triggered to update the route table to this instance.
A quick recap: The AWS CDK allows developers use the CDK framework in one of the supported programming languages to define reusable cloud components called constructs, which are composed together into stacks, forming a "CDK app".
This CDK app includes 3 stacks:
- VPC stack: A VPC across 2 AZs with 1 public and 1 private subnet in each AZ
- Squid stack: Squid instances in Auto Scaling Groups with required components to achieve high availablity.
- Test instance stack: A test instance that can be accessed using AWS Systems Manager Session Manager
AWS CDK Context values are key-value pairs that can be associated with a stack or construct. In this project they are used for some basic information required to deploy the solution.
The context
key in the cdk.json
file is one of the ways that context values can be made available to the CDK app.
Here, it is used to get some basic information required to deploy the resources in your AWS Account. This allows you the possibility to extend this to add multiple accounts, regions, separate VPC CIDRs and use a different runtime context to deploy resources.
"region": "ap-southeast-1",
"account": "xxxxxxxxxxxx",
"vpc_cidr": "10.0.0.0/16"
The “main” for the CDK app is the app.py
. We get the context values defined in the cdk.json
file
# Get context variable values
account = app.node.try_get_context('account')
region = app.node.try_get_context('region')
vpc_cidr = app.node.try_get_context('vpc_cidr')
Prepare the "environment". An environment is the target AWS account and AWS Region into which the stack is intended to be deployed.
env = core.Environment(account=account, region=region)
We pass these to the VPC stack.
VPCStack(app, "vpc", env=env, vpc_cidr=vpc_cidr)
To create the Squid and Test Instance stacks, the VPC is required. We pass the VPC construct created as part of the VPC stack to these 2 stacks. This will allow us to use this VPC within these stacks.
SquidStack(app, "squid", env=env, vpc=vpc_stack.vpc)
TestInstanceStack(app, "test-instance", env=env, vpc=vpc_stack.vpc)
The AWS CDK creates an implicit dependency between the VPC stack and these stacks.
Let's dive a little deeper into the stacks
The VPC stack creates a VPC across 2 AZs, with 2 public and 2 isolated subnets using a high level CDK Construct. We use the vpc_cidr
context value to define the VPC CIDR.
ec2.Vpc(self, "vpc",
max_azs=2,
cidr=vpc_cidr,
subnet_configuration=[ec2.SubnetConfiguration(
subnet_type=ec2.SubnetType.PUBLIC,
name="Public",
cidr_mask=24
),
ec2.SubnetConfiguration(
subnet_type=ec2.SubnetType.ISOLATED,
name="Isolated",
cidr_mask=24
)
]
)
Note that we use ISOLATED
as do not want NAT Gateways to be provisoned as part of the VPC. The Squid instances will be used as NAT instances.
The Squid stack consists of 3 Constructs:
- SquidAsgConstruct (
squid_asg_construct.py
): This construct creates the core Squid application. It creates 2 Auto Scaling Groups (ASGs), one in each public subnet, that consist of one Squid instance each, an IAM role to attach to the instances and a S3 bucket to hold Squid configuration files.
First we create the IAM role using a combination of managed policies and a custom policy
# create an IAM role to attach to the Squid instances
squid_iam_role = iam.Role(self,"squid-role",
assumed_by=iam.ServicePrincipal("ec2.amazonaws.com"),
managed_policies=[iam.ManagedPolicy.from_aws_managed_policy_name("CloudWatchAgentServerPolicy"),
iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AmazonEC2RoleforSSM")]
)
# Add policy to allow EC2 update instance attributes
squid_iam_role.add_to_policy(statement= iam.PolicyStatement(effect=iam.Effect.ALLOW,
actions=['ec2:ModifyInstanceAttribute',],
resources=['*']
)
)
Then, we create the S3 bucket
squid_config_bucket = s3.Bucket(self,"squid-config",
encryption = s3.BucketEncryption.KMS_MANAGED)
Upload the Squid config files to this bucket. The config files are located in the ./squid_app/squid_config_files/config_files_s3/
allowed_domains.txt
contains the allowed domains on the Squid proxy
squid.conf
is the configuration file used to configure Squid
s3_deployment.BucketDeployment(self,"config",
destination_bucket=squid_config_bucket,
sources=[s3_deployment.Source.asset(path='./squid_app/squid_config_files/config_files_s3')]
)
The instance role requires access to this S3 bucket.
squid_config_bucket.grant_read_write(identity=squid_iam_role)
Define the AMI to be used for the Squid instances. In this case we are using Amazon Linux 2.
amazon_linux_2_ami = ec2.MachineImage.latest_amazon_linux(
generation=ec2.AmazonLinuxGeneration.AMAZON_LINUX_2,
edition=ec2.AmazonLinuxEdition.STANDARD,
virtualization=ec2.AmazonLinuxVirt.HVM,
storage=ec2.AmazonLinuxStorage.GENERAL_PURPOSE
)
As a Squid instance is required in each AZ, we loop through the availability zones for this VPC and create an ASG with min, max and desired capacity as 1 in each AZ in a public subnet. For this example, we are using a t3.nano
instance.
Note the usage of resource signal that lets CloudFormation know if the resource was created successfully (or failed).
for count, az in enumerate(vpc.availability_zones, start=1):
asg = autoscaling.AutoScalingGroup(self,f"asg-{count}",vpc=vpc,
instance_type=ec2.InstanceType("t3.nano"),
desired_capacity=1,
max_capacity=1,
min_capacity=1,
machine_image=amazon_linux_2_ami,
role=squid_iam_role,
vpc_subnets=ec2.SubnetSelection(
availability_zones=[az],
one_per_az=True,
subnet_type=ec2.SubnetType.PUBLIC
),
health_check=autoscaling.HealthCheck.ec2(grace=core.Duration.minutes(5)),
resource_signal_count=1,
resource_signal_timeout=core.Duration.minutes(10)
)
The user data bash script is located ./squid_app/squid_config_files/user_data/squid_user_data.sh
. It disables source/destination check on the instance to allow for the instance to be used as a NAT instance, installs and configures Squid and installs and configures the CloudWatch Agent. The agent collects CPU usage metrics for the Squid process every 10 seconds and collect and store Squid access and cache logs in CloudWatch Logs.
A dictionary is used to create a mapping of the values requried in the user data of the Launch Configuration of the ASG.
user_data_mappings = {"__S3BUCKET__": squid_config_bucket.bucket_name,
"__ASG__": asg_logical_id,
"__CW_ASG__": "${aws:AutoScalingGroupName}"
}
We can use core.Fn.sub() (equivalent of CloudFormation Fn::Sub) to substitute with values that are avaialable at runtime and use this as the user data for the instances launched in this Launch Configuration.
# Replace parameters with values in the user data
with open("./squid_app/squid_config_files/user_data/squid_user_data.sh", 'r') as user_data_h:
# Use a substitution
user_data_sub = core.Fn.sub(user_data_h.read(), user_data_mappings)
# Add User data to Launch Config of the autoscaling group
asg.add_user_data(user_data_sub)
Security group attached to the instances allow communication on ports 80 & 443 from VPC CIDR
asg.connections.allow_from(other=ec2.Peer.ipv4(vpc.vpc_cidr_block),
port_range=ec2.Port(
protocol=ec2.Protocol.TCP,
string_representation="HTTP from VPC",
from_port=80,
to_port=80
)
)
asg.connections.allow_from(other=ec2.Peer.ipv4(vpc.vpc_cidr_block),
port_range=ec2.Port(
protocol=ec2.Protocol.TCP,
string_representation="HTTPS from VPC",
from_port=443,
to_port=443
)
)
A Lifecycle Hook is used to allow for the completion of the Squid configuration before the instance is marked healthy.
autoscaling.LifecycleHook(self,f"asg-hook-{count}",
auto_scaling_group=asg,
lifecycle_transition=autoscaling.LifecycleTransition.INSTANCE_LAUNCHING,
notification_target=hooktargets.TopicHook(sns.Topic(self,f"squid-asg-{count}-lifecycle-hook-topic",
display_name=f"Squid ASG {count} Lifecycle Hook topic")
),
default_result=autoscaling.DefaultResult.ABANDON,
heartbeat_timeout=core.Duration.minutes(5)
)
We use a route table tag on the ASG to identify the private route table attached to the private subnet in an AZ.
# Loop through all non public subnets in AZ to identify route table and create a tag value string
for subnet in non_public_subnets_in_az:
if route_table_ids:
route_table_ids=f"{route_table_ids},{subnet.route_table.route_table_id}"
else:
route_table_ids=subnet.route_table.route_table_id
# Tag the ASG with route table ids
core.Tag.add(asg,
key='RouteTableIds',
value=route_table_ids,
apply_to_launched_instances=False
)
- SquidMonitoringConstruct (
squid_monitoring_construct.py
): This construct creates the Squid health check Alarms for the ASGs and a SNS topic where notifcations are published when alarm state changes.
Create the SNS topic
self.squid_alarm_topic = sns.Topic(self,"squid-asg-alarm-topic", display_name='Squid ASG Alarm topic')
For each Squid ASG, create a metric to monitor the Squid process using the CloudWatch Agent. More info on the procstat
plugin
squid_metric = cloudwatch.Metric(metric_name="procstat_cpu_usage",
namespace='CWAgent',
dimensions=dict(AutoScalingGroupName=asg.auto_scaling_group_name,
pidfile="/var/run/squid.pid",
process_name="squid")
)
For each Squid ASG, create a CloudWatch Alarm based on the metric above. The alarm checks every 10 seconds if the metric goes breaches the threshold or if data points are missing
squid_alarm = cloudwatch.Alarm(self,f"squid-alarm-{count}",
alarm_description=f"Heart beat for Squid instance {count}",
alarm_name=f"squid-alarm_{asg.auto_scaling_group_name}",
comparison_operator=cloudwatch.ComparisonOperator.LESS_THAN_THRESHOLD,
metric=squid_metric,
period=core.Duration.seconds(10),
evaluation_periods=1,
threshold=0.0,
statistic='Average',
treat_missing_data=cloudwatch.TreatMissingData.BREACHING
)
CloudWatch Alarm actions are configured to send notifications on state changes to the SNS topic.
squid_alarm.add_alarm_action(cw_actions.SnsAction(self.squid_alarm_topic))
squid_alarm.add_ok_action(cw_actions.SnsAction(self.squid_alarm_topic))
- SquidLambdaConstruct (
squid_lambda_construct.py
): This construct creates the Lambda function that is triggered when the alarm state changes and the IAM role assumed by Lambda to execute this function.
Similar to the IAM role created for the instances in the ASGs, we create an IAM role required for Lambda.
# Create IAM role for Lambda
lambda_iam_role = iam.Role(self,"lambda-role",
assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"),
managed_policies=[iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AWSLambdaBasicExecutionRole")]
)
# Add policies to allow Lambda that allow it to update route tables of the VPC to point to a healthy Squid instance ENI
lambda_iam_role.add_to_policy(statement= iam.PolicyStatement(effect=iam.Effect.ALLOW,
actions=['ec2:ModifyInstanceAttribute',
'autoscaling:Describe*',
'autoscaling:CompleteLifecycleAction',
'autoscaling:SetInstanceHealth',
'cloudwatch:Describe*',
'ec2:CreateRoute',
'ec2:CreateTags',
'ec2:ReplaceRoute',
'ec2:Describe*',
],
resources=['*']
)
)
Then, we create the Lambda function.
squid_alarm_lambda = _lambda.Function(self, "alarm-function",
runtime=_lambda.Runtime.PYTHON_3_8,
handler="lambda-handler.handler",
code=_lambda.Code.asset("./squid_app/squid_config_files/lambda"),
role=lambda_iam_role,
timeout=core.Duration.seconds(60)
)
A separate method add_sns_subscription
is part of this construct. In this method the ARN of the SNS Topic created in the Monitoring construct is added as an environment variable for the Lambda function. Permissions are added for the SNS Topic to invoke the Lambda function. Also, the Lambda function is subscribed to the SNS Topic.
lambda_function.add_environment(key="TOPIC_ARN", value = squid_alarm_topic.topic_arn)
lambda_function.add_permission("squid-lambda-permission",
principal=iam.ServicePrincipal("sns.amazonaws.com"),
action='lambda:InvokeFunction',
source_arn=squid_alarm_topic.topic_arn
)
squid_alarm_topic.add_subscription(sns_subscriptions.LambdaSubscription(lambda_function))
The Lambda function code is located: ./squid_app/squid_config_files/lambda/lambda-handler.py
. It parses the SNS event to identify the ASG that published the message and if the Alarm state is ALARM
or OK
.
If the state is ALARM
, the function will mark the instance as unhealthy and update the route table of the private subnets of the affected AZ to redirect the traffic to a healthy Squid instance.
If the state is OK
, the function will complete the Auto Scaling Lifycle Hook action as complete which marks the instance as healthy and then update the route table of the private subnets to route the traffic via the Squid instance in the same AZ.
The Test Instance stack creates a single EC2 instance in the Isolated subnet and an IAM role attached to the instance.
The IAM role has a single AWS Managed policy attached to it that allows access to this instance from AWS Systems Manager Session Manager
instance_role = iam.Role(self, "test-instance-SSM-role",
assumed_by=iam.ServicePrincipal("ec2.amazonaws.com")
)
instance_role.add_managed_policy(iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AmazonEC2RoleforSSM"))
The test instance also uses Amazon Linux 2. It is placed in the isolated subnet with all outboud access allowed.
ec2.Instance(self, "test-instance",
instance_type=ec2.InstanceType("t3.nano"),
machine_image=amazon_linux_2_ami,
vpc=vpc,
vpc_subnets=ec2.SubnetSelection(
subnet_type=ec2.SubnetType.ISOLATED
),
role=instance_role,
allow_all_outbound=True
)
Pre-requisites:
- An AWS account
- AWS CLI, authenticated and configured
- Python 3.6+
- AWS CDK
- Git
Ensure that the CDK CLI is upgrded to latest version
npm -g install aws-cdk
git clone https://github.com/aws-samples/aws-cdk-transparent-squid-proxy
AWS CDK Context values are key-value pairs that can be associated with a stack or construct. In this project they are used for some basic information required to deploy the solution.
The context
key in the cdk.json
file is one of the ways that context values can be made available to the CDK app.
Navigate to the cloned directory
$ cd aws-cdk-transparent-squid-proxy
Open the cdk.json
file in a text editor and update the following values required by the CDK app:
account
: The AWS account number to deploy the stacksregion
: The AWS region to deploy the stacksvpc_cidr
: The CIDR range to use for the VPC
Note that the cdk.json
also states which Python command to use. Depending on your setup this may either be python3
or python
. Please update this if necessary for the app
key in the cdk.json
file.
"app": "python app.py"
Or "app": "python3 app.py"
For all following steps
python
will be used. Replace if necessary
$ python -m venv .env
$ source .env/bin/activate
# Windows Command Prompt
.env\Scripts\activate.bat
# Windows PowerShell
.env\Scripts\Activate.ps1
$ pip install -r requirements.txt
When CDK apps are executed, they produce (or “synthesize") an AWS CloudFormation template for each stack defined in the application.
$ cdk synth
After this command executes successfully you can view the CloudFormation templates in the cdk.out
folder
The first time you deploy an AWS CDK app into an environment (account/region), you’ll need to install a “bootstrap stack”. This stack includes resources that are needed for the CDK toolkit’s operation. For example, the stack includes an S3 bucket that is used to store templates and assets during the deployment proces
$ cdk bootstrap
Deply all stacks to your AWS account & region
When prompted, approve the request to allow CloudFormation to create IAM roles and security groups (Answer y to the question: Do you wish to deploy these changes?).
If you want to override the approval prompts, add the --require-approval never
option
$ cdk deploy "*"
This will begin the process of deploying the 3 stacks. The *
means that all stacks in the CDK app will be deployed
- On the AWS Systems Manager console: Choose Session Manager
- Select the test instance and choose Start Session
- After the connection is made, you can test the solution with the following commands. Only the last 2 requests should return a valid response, because Squid allows traffic to
*.amazonaws.com
only.curl http://www.amazon.com
curl https://www.amazon.com
curl http://calculator.s3.amazonaws.com/index.html
curl https://calculator.s3.amazonaws.com/index.html
To test with other other domains, you can update the allowed_domains.txt
file in S3 with allowed domains and wait for the configuration to get updated (~1-2 mins)
A CDK application can be destroyed by using the following command:
$ cdk destroy "*"`
When asked to confirm the deletion of the 3 stacks, select “y
”.
AWS CLoud Development Kit (CDK) Developer Guide
AWS CDK API Reference
AWS CDK Workshop
AWS CDK Examples
This library is licensed under the MIT-0 License. See the LICENSE file.