This is a documentation post for a sample Java AWS CDK setup that I've created while preparing for a certification exam. I've created a CDK stack with the following elements:
It is only for presentation purpose, it is hosted on Github:
https://github.com/stokilo/aws-cdk-rds-vpnI want to document what I wanted to achieve and all my findings during development.
Original idea was to create a new, non-default VPC with two 3 subnets. First two as private and the third one as public. The CDK automatically configure a NAT Gateway when a private subnet is defined. In my case, I didn't want to pay for that and I don't need outgoing connections. By selecting isolated subnet type Nat Gateway is not created. This is what I needed for RDS setup.
In the private subnet, I deployed Aurora Postgres database. It was configured in a single master mode and deployed in 2 AZ for HA. I wanted to connect to reader and writer endpoints using the fixed name. I've decided to call them reader.rds.com and writer.rds.com. This allows me to configure the database client and save connections in it without the need to reconfigure it every time stack are redeployed.
Additionally, I wanted to connect directly to the database hosted on AWS instances. Of course, it would be much cheaper setup it on my local machine i.e. in docker container. This is not what I wanted. In the future, I plan to rewrite this stack and setup Aurora serverless with the Data Api. I've decided to use VPN client connection and connect to the VPC subnet directly. There is a product in AWS called VpnClient. That is what I provision in the sample project because it is good for a solo dev setup.
I own a domain awss.ws, I wanted to have a subdomain vpn.awss.ws and always connect to this address with OpenVpn client. AWS VpnClient generates endpoint URL and every time stack is deployed, a new name is assigned. Adding this to Route53 manually is cumbersome process. That is why my stack setup public hosted zone CNAME entries to allow me hardcode vpn.awss.ws in my OpenVpn config file.
My slawomirstec.com website is provisioned with CDK using Python language. Here I decided to use Java. CDK and lambda code that is required for some post-deployment steps, are contained in separated maven modules. I build both from the parent module, I had to use custom app launcher code in cdk.json.
CDK code is in the module cdk, main class of course too.
I could not find BOM for CDK, it was only available for SDK
The cdk application is wrapped into custom cdk.sh script. This script allows to define a target deployment account using env.properties file. There I configure account/region/public hosted zone/domain name settings per environment. I don't like AWS CLI and CDK profile discovery logic, I use separate accounts per stage system, it is possible to deploy by accident dev stack to prod. I added some code to cross-check profile discovered from AWS credentials and match it with the configured account in env.properties.
AWS CDK provides default values for services, I find out that you can get NAT gateway automatically configured depending on the selected subnet type. VPC has two subnets that are isolated from the public internet. These are for RDS deployment and the VpnClient association. Security group only allows Postgres and DNS ingress traffic. DNS is set to default reserved VPC DNS IP, not to the host DNS where OpenVpn is running. This is required to resolve private hosted zone names (*.rds.com).
*VPC default DNS is not always working, this is still todo on my list to inspect real cause. It is possible that I have major configuration issue in the CDK code.
VpnClient configuration was easy, sample code below for PROD and DEV. The latest CDK version handles a lot of details for you in the background. No need to define Vpn connection associations, security groups, or authorization. All is resolved by the library on synth time.
More work was required to update the CNAME record in a public DNS zone. I could not find the way hot to get back autogenerated DNS name for the Vpn endpoint. AWS SDK has a method 'describeClientVpnEndpoints' that can return a list of all endpoints. But here is a problem. CDK is provisioning infrastructure and there is no easy way to instruct it to do some extra work outside provided API. And it should not be done like that because the CDK code should be responsible for all infrastructure elements to safely update or destroy it.
Running custom logic after stack deployment can lead to unmaintainable stack. But here I didn't have a choice, API is not allowing me to extract DNS name from the Vpn endpoint. Additionally, I've updated the existing, public-hosted zone in Route53. I don't provision this part of infrastructure with CDK. That is why I assume this is safe to add one record after the main stack is deployed. In order to execute post-processing logic after deployment, I've implemented a step function workflow that calls a lambda function every minute until the stack is up. The step function is triggered by the EventBridge rule 5 minutes after deployment. This is of course in theory, in practice, part of the stack with the rule is executed first, before VPC/Vpn Client is deployed. However, lambda code handles failure and step function executes it until success is returned from. Sample code below.
I did a big mistake and deployed an infinite loop in the step function :)
I've received an email from AWS about the end of my free tier for step functions. I was surprised because I have one wait condition, one lambda step, and a choice element. Of course, I didn't connect Choice with wait condition but directly to lambda, which caused an infinite loop until the whole stack was deployed. I was lucky that there was a timeout set for step function to 1 hour and lambda returned success pretty quickly after deployment
Another important lesson from working with the CDK. Don't update or delete anything that was created by the CDK. This will cause the failure of the CDK update in the future. Even small changes prevent you from updating the stack or even destroying it.
I found that deploying an isolated stack works the best. In case anything fails, I destroy the stack and recreate it again.
Additionally, I've noticed that sometimes my DNS resolution is not working after connecting to the VPN. I'm using VPC internal DNS for resolving IP addresses. I don't use my host DNS server because I want to resolve private hosted zone RDS names. It is a random error, something is wrong with the stack I've created or the region I operate in (Bahrain).
Github repo contains README on how to do all setup steps required to bring the services up. There is no need to repeat it here. I plan to evolve this stack and replace RDS with Aurora serveless and Data API. For backend services, I will add integration with Fargate and sample spring boot application.