Going Serverless: How and Why to Fetch Configuration Data from SSM
The serverless framework is a popular development tool to allow developers to quickly and easily create applications without worrying about infrastructure and other network management.
Enterprise organisations typically develop multiple serverless applications in a "microservices" approach, combining them to create a unified product. For example, an online store might have a handful of applications that come together to form one system:
- A customer-facing, frontend website displaying various products for purchase
- A backend "products" serverless application for managing products, tags, variants, prices and similar information
- A backend "checkout" serverless application for managing shopping baskets, and payment gateway integrations
- A backend "shipping" serverless application for managing product shipments, delivery service integrations and notifications
Indeed, it is possible to create one extensive backend system that manages all of the products, checkout and shipping concerns in the above example. However, this approach would lose all of the benefits of iteratively developing smaller pieces of software independently in focussed and specialised teams.
In a scenario like the one above, it is typical for each of the serverless applications to share most of the serverless configuration, including:
- Security groups and subnets
- RBAC configurations (e.g. for API Gateways)
- IAM user names or shared policies
- Database connection (e.g. AWS Neptune connection strings)
- Various custom environment variables at your organisation
In particular, it is typical for "network infrastructure" like the VPC to be created outside of the serverless projects entirely (e.g. with Terraform) and used by the serverless projects.
So what is the best way to get externally-created, shared configuration data into our multiple serverless projects without tedious repetition and the ability for human error?
Hardcoding Configuration
The first approach is to hardcode everything, everywhere, which has the benefit of being the easiest and fastest to implement but the downside of not scaling to multiple deployment environments for one project or numerous projects very well.
For example, our serverless.yml
file would have the following snippet for VPC configuration:
...
provider:
name: aws
vpc:
securityGroupIds:
- sg-0123456789
subnetIds:
- subnet-0000000000
...
Of course, this would not work for a dev
deployment, test
deployment and live
deployment, as the ID values would change between these environments.
Nonetheless, this would be entirely acceptable for a proof-of-concept project or the first iteration of a system.
CI/CD Variables
With CI/CD systems like GitLab, you would typically refactor the above code into:
...
provider:
name: aws
vpc:
securityGroupIds:
- $SECURITY_GROUP_ID_1
subnetIds:
- $SUBNET_ID_1
...
This would allow the CI/CD provider to inject the correct values per environment during the deployment.
For example, with GitLab, you would have a .gitlab-ci.yml
file with the following structure:
image: node:latest
stages:
- deploy
.deploy: &deploy
- export SECURITY_GROUP_ID_1=$SECURITY_GROUP_ID_1 # from GitLab CI/CD variables
- export SUBNET_ID_1=$SUBNET_ID_1 # from GitLab CI/CD variables
- yarn install --frozen-lockfile
- yarn global add serverless
- serverless deploy --stage $CI_ENVIRONMENT_NAME --verbose
deploy dev:
stage: deploy
script:
- *deploy
only:
- master
environment:
name: dev
deploy test:
stage: deploy
script:
- *deploy
only:
- tags
environment:
name: test
deploy live:
stage: deploy
script:
- *deploy
when: manual
only:
- tags
environment:
name: live
With this setup, GitLab CI/CD injects the security group and subnet information, and the values change based on the environment.name
.
However, you will more than likely have you manually copy and paste multiple configuration values into your CI/CD setup for this to work effectively across multiple projects. You can use group-scoped (shared) environment variables in GitLab, but this will vary between CI/CD providers and plans. Depending on the configuration being injected, it is also error-prone and potentially prone to sudden change and failure.
Fetching Values from Parameter Store
My preferred approach is to avoid as many manual and non-code steps as possible. Ideally, my project should also be deployable no matter where the serverless deploy
command is executed. For example, I like being able to do deployments locally, from time to time, without needing to look up and set up these variables on my machine, especially for debugging purposes.
This is why I opt to use AWS Systems Manager Parameter Store to store my configuration data, which can be used to fetch any values needed at the time of each serverless deploy
.
These two benefits of automation and the ability to run commands deployments from anywhere come from the following serverless.yml
configuration:
...
provider:
name: aws
vpc:
securityGroupIds:
- ${ssm(aws:region):/${opt:stage}-serverless-security-group}
subnetIds:
- ${ssm(aws:region):/${opt:stage}-vpc-subnet-1}
...
With this setup, the security group ID and subnet ID values are fetched from the AWS storage for each deployment and are always guaranteed to be the latest and correct value. For VPC configuration, it is feasible that an administrator would like to move workloads between subnets or amend security groups from time to time and update the Parameter Store values, so subsequent serverless deployments simply work without any other intervention.
Note: The aws:region
is provided by the framework, and the stage is interpolated to allow for a different value per environment to be fetched from AWS.
Automation with Terraform
When Terraform creates networking, infrastructure and other resources, we should extend the Terraform project to add the relevant outputs into Parameter Store using the aws_ssm_parameter
resource, as shown in the following example:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
# various configuration options
}
resource "aws_security_group" "serverless-sg" {
name = "serverless-sg"
vpc_id = module.vpc.vpc_id
# various configuration options
}
resource "aws_ssm_parameter" "vpc-subnet-1" {
name = "dev-vpc-subnet-1"
type = "String"
value = module.vpc.private_subnets[0]
}
resource "aws_ssm_parameter" "serverless-security-group" {
depends_on = [aws_security_group.serverless-sg]
name = "dev-serverless-security-group"
type = "String"
value = aws_security_group.serverless-sg.id
}
This structure ensures any terraform apply
saves the required information directly into the AWS backend for our serverless availability and usage.
Considerations
This convention to store configuration in the Parameter Store is best applied when:
- Multiple serverless projects use the same information (e.g. VPC configuration, database connection strings, etc.)
- The configuration values could feasibly change in the future, and you want to avoid the manual and tedious effort to apply value updates to your deployment users and systems.
- You are not storing secrets this way. Instead, secrets should be fetched using AWS SSM or similar technologies at runtime.
Other noteworthy comments:
- The cost for fetching data from AWS per deployment should be negligible
- A good convention is to prefix configuration with the environment name
- The deployment user will need the appropriate permissions to fetch values from AWS Parameter Store. A good convention could be to allow certain users access to all
dev*
parameters but not tolive*
parameters, depending on your organisation - It is possible to save a
StringList
to the Parameter Store rather than multiple, individual values for arrays
You will thank yourself for fetching configuration this way at any organisation with even a tiny number of serverless applications to manage.