binxio/cfn-postgresql-user-provider

Demo timouts in my VPC

Closed this issue · 4 comments

Hi,

thanks for supplying this nice custom resource. Unfortunately I don't get it to work in my AWS account. As the command in the Make file for deploying the demo doesn't work, I suppose there is something wrong with my account / default VPC. Additionally I can't see anything in the logs.

What I've done:

export VPC_ID=$(aws ec2  --output text --query 'Vpcs[?IsDefault].VpcId' describe-vpcs)
echo $VPC_ID
vpc-dffc27b5

export SUBNET_IDS=$(aws ec2 --output text --query "Subnets[*].SubnetId" \
                        describe-subnets --filters "Name=vpc-id,Values=${VPC_ID}" | tr '\t' ',')
echo $SUBNET_IDS
subnet-429a560e,subnet-330a8959,subnet-84bf46f8

export SG_ID=$(aws ec2 --output text --query "SecurityGroups[*].GroupId" \
                        describe-security-groups --group-names default  --filters Name=vpc-id,Values=$VPC_ID)
echo $SG_ID
sg-3c33e544

But

make demo
create demo in default VPC vpc-dffc27b5, subnets  using security group sg-3c33e544.
Either there is no default VPC in your account, no two subnets or no default security group available in the default VPC
make: *** [demo] Error 1

This is because the make script uses (which I don't really understand tbh:

aws ec2 --output text --query 'RouteTables[?Routes[?GatewayId == null]].Associations[].SubnetId' \
                                describe-route-tables --filters Name=vpc-id,Values=${VPC_ID} | tr '\t' ','

My subnets look like this:

aws ec2 --output json \
                        describe-subnets --filters "Name=vpc-id,Values=${VPC_ID}"
{
    "Subnets": [
        {
            "AvailabilityZone": "eu-central-1c",
            "AvailabilityZoneId": "euc1-az1",
            "AvailableIpAddressCount": 4090,
            "CidrBlock": "172.31.0.0/20",
            "DefaultForAz": true,
            "MapPublicIpOnLaunch": true,
            "State": "available",
            "SubnetId": "subnet-429a560e",
            "VpcId": "vpc-dffc27b5",
            "OwnerId": "332349559535",
            "AssignIpv6AddressOnCreation": false,
            "Ipv6CidrBlockAssociationSet": [],
            "SubnetArn": "arn:aws:ec2:eu-central-1:332349559535:subnet/subnet-429a560e"
        },
        {
            "AvailabilityZone": "eu-central-1a",
            "AvailabilityZoneId": "euc1-az2",
            "AvailableIpAddressCount": 4090,
            "CidrBlock": "172.31.16.0/20",
            "DefaultForAz": true,
            "MapPublicIpOnLaunch": true,
            "State": "available",
            "SubnetId": "subnet-330a8959",
            "VpcId": "vpc-dffc27b5",
            "OwnerId": "332349559535",
            "AssignIpv6AddressOnCreation": false,
            "Ipv6CidrBlockAssociationSet": [],
            "SubnetArn": "arn:aws:ec2:eu-central-1:332349559535:subnet/subnet-330a8959"
        },
        {
            "AvailabilityZone": "eu-central-1b",
            "AvailabilityZoneId": "euc1-az3",
            "AvailableIpAddressCount": 4089,
            "CidrBlock": "172.31.32.0/20",
            "DefaultForAz": true,
            "MapPublicIpOnLaunch": true,
            "State": "available",
            "SubnetId": "subnet-84bf46f8",
            "VpcId": "vpc-dffc27b5",
            "OwnerId": "332349559535",
            "AssignIpv6AddressOnCreation": false,
            "Ipv6CidrBlockAssociationSet": [],
            "SubnetArn": "arn:aws:ec2:eu-central-1:332349559535:subnet/subnet-84bf46f8"
        }
    ]
}

Did I delete something accidentially?

The creation of the provider stacks works just fine (secret-provider and postgresql-user-provider), but the demo stack failes, because the creation of the db users takes forever:

KongReaderUser | - | Custom::PostgreSQLUser | CREATE_IN_PROGRESS | -
KongUser | - | Custom::PostgreSQLUser | CREATE_IN_PROGRESS

In the logs of the lambdas I just see the following:

2020-07-04T11:55:14.670+02:00 | START RequestId: d660db1e-a9f4-4366-95d3-7da8d309c387 Version: $LATEST
-- | --
  | 2020-07-04T11:55:17.673+02:00 | END RequestId: d660db1e-a9f4-4366-95d3-7da8d309c387
  | 2020-07-04T11:55:17.673+02:00 | REPORT RequestId: d660db1e-a9f4-4366-95d3-7da8d309c387 Duration: 3003.16 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 84 MB Init Duration: 520.16 ms
  | 2020-07-04T11:55:17.673+02:00 | 2020-07-04T09:55:17.673Z d660db1e-a9f4-4366-95d3-7da8d309c387 Task timed out after 3.00 seconds
  | 2020-07-04T11:56:12.859+02:00 | START RequestId: d660db1e-a9f4-4366-95d3-7da8d309c387 Version: $LATEST
  | 2020-07-04T11:56:15.864+02:00 | END RequestId: d660db1e-a9f4-4366-95d3-7da8d309c387
  | 2020-07-04T11:56:15.864+02:00 | REPORT RequestId: d660db1e-a9f4-4366-95d3-7da8d309c387 Duration: 3003.14 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 47 MB
  | 2020-07-04T11:56:15.864+02:00 | 2020-07-04T09:56:15.864Z d660db1e-a9f4-4366-95d3-7da8d309c387 Task timed out after 3.00 seconds
  | 2020-07-04T11:58:19.044+02:00 | START RequestId: d660db1e-a9f4-4366-95d3-7da8d309c387 Version: $LATEST
  | 2020-07-04T11:58:22.049+02:00 | END RequestId: d660db1e-a9f4-4366-95d3-7da8d309c387
  | 2020-07-04T11:58:22.049+02:00 | REPORT RequestId: d660db1e-a9f4-4366-95d3-7da8d309c387 Duration: 3003.33 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 47 MB
  | 2020-07-04T11:58:22.049+02:00 | 2020-07-04T09:58:22.049Z d660db1e-a9f4-4366-95d3-7da8d309c387 Task timed out after 3.00 seconds
  | 2020-07-04T11:58:28.147+02:00 | START RequestId: a822235c-ee83-49c4-b913-3e86e5950769 Version: $LATEST
  | 2020-07-04T11:58:31.152+02:00 | END RequestId: a822235c-ee83-49c4-b913-3e86e5950769
  | 2020-07-04T11:58:31.152+02:00 | REPORT RequestId: a822235c-ee83-49c4-b913-3e86e5950769 Duration: 3003.17 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 47 MB
  | 2020-07-04T11:58:31.152+02:00 | 2020-07-04T09:58:31.152Z a822235c-ee83-49c4-b913-3e86e5950769 Task timed out after 3.00 seconds

How can I get more logs? The lambda seems to be triggered, but I assume it can't connect to the database or the parameter store?

Thansk in advance, hope I've inserted all needed information

Hi @fvosberg, when there are timeouts, there are security group issues :-p

The RDS database should be created in a private subnet. It has public access set to false and the security group only provides access from private ip addresses.

this statement:

aws ec2 --output text --query 'RouteTables[?Routes[?GatewayId == null]].Associations[].SubnetId' \
                                describe-route-tables --filters Name=vpc-id,Values=${VPC_ID} | tr '\t' ','

Tries to get the private subnets but fails to detect them.

So specify the private subnet ids and you should be fine. Alternatively, you set public access to true on the database, and modify the sg to allow access to port 5432 to 0.0.0.0/0.

Thanks for your reply.

I've thought the same - regarding the security group as the cause of my problem. Independent of that, it might be a good idea to have a shorter timout in the code opening the connection, than the timeout of the lambda. I think this would cause Cloudformation to fail faster instead of waiting an hour to realize, that the lambda can't stabilize, right?

Regarding my security group:

As I've used my code retrieving the subnets and a security group and providing it to the demo stack you've provided in the repository, the database and the lambda should be in the same subnets and the security group associated with the lambda should be the SourceSecurityGroup in the DatabaseSecurityGroup, right?

To verify that:

The networking config of the database:

Networking
Availability zone
eu-central-1a
VPC
vpc-dffc27b5
Subnet group
cfn-database-user-provider-demo-2-dbsubnetgroup-1oe3m1lcvwims
Subnets
subnet-84bf46f8
subnet-330a8959
subnet-429a560e
Security
VPC security groups
cfn-database-user-provider-demo-2-DatabaseSecurityGroup-185A11WAXIFIA (sg-03e51c57fea527107)
( active )
Public accessibility
No
Certificate authority
rds-ca-2019
Certificate authority date
Aug 22nd, 2024

Inbound rule of sg-03e51c57fea527107 - cfn-database-user-provider-demo-2-DatabaseSecurityGroup-185A11WAXIFIA

Type | Protocol | Port range | Source | Description - optional
-- | -- | -- | -- | --
PostgreSQL | TCP | 5432 | sg-3c33e544 (default) | -

The lambda config:

VPC
vpc-dffc27b5 (172.31.0.0/16) | Default
Subnets
subnet-429a560e (172.31.0.0/20) | eu-central-1c
subnet-330a8959 (172.31.16.0/20) | eu-central-1a
subnet-84bf46f8 (172.31.32.0/20) | eu-central-1b
Security groups
sg-3c33e544 (default)

The lambda and the database have to be put in a private subnet with a nat gateway attached for traffic to the public internet. Without a nat gateway, and without a public IP address no network traffic is possible.

the database cannot be reached from the public internet and the lambda cannot access AWS CFN endpoints it needs (CFN, CloudWatch etc.)

Please create three private subnets, a nat gateway in a public subnet and a route table to associate with these tree subnets and add a route for 0.0.0.0/0 to the nat gateway. Use these three private subnets as input for the demo.

Thanks, I found the solution on my own and came back to write it here. You are right, I had to allow outbound traffic to the internet or PrivateLink.