Canary Deployments for Serverless APIs with API Gateway
Safely roll out new features with canary deployments. Learn how to configure API Gateway and Lambda to gradually shift traffic to a new version of your API, minimizing the impact of potential bugs.
Deploying new code to production is always risky. Even with extensive testing, bugs can slip through. A bad deployment can cause outages, impact users, and erode trust. To mitigate this risk, modern DevOps practices favor progressive deployment strategies like canary releases.
A canary release (or canary deployment) is a technique where you roll out a new version of your application to a small subset of users before making it available to everyone. This small group acts as the "canary in the coal mine." If something goes wrong, it only affects a small percentage of your users, and you can quickly roll back the change.
For serverless APIs built with Amazon API Gateway and AWS Lambda, you can implement canary deployments directly within the AWS ecosystem.
The Core Components for a Canary Release
Lambda Versions and Aliases: Instead of pointing API Gateway directly to a Lambda function, you point it to an alias. An alias is a pointer to a specific version of your Lambda function. This is the key to managing different versions of your code.
API Gateway Stages: An API Gateway stage is a snapshot of your API (e.g.,
dev
,prod
). When you deploy your API, you deploy it to a specific stage.Deployment Canary Settings: API Gateway's canary release feature allows you to create a special deployment within a stage that splits traffic between two different versions of your backend.
Step-by-Step Canary Deployment Workflow
Let's walk through the process of releasing a new version of a Lambda function.
Step 1: Publish a New Lambda Version
First, make your code changes and then publish a new version of your Lambda function. Every time you publish a version, it's an immutable snapshot of your code and configuration.
- In the Lambda console, go to your function and click "Publish new version."
- This creates Version 2, Version 3, and so on. Your original, editable function is referred to as
$LATEST
.
Step 2: Update a Lambda Alias
Create a Lambda alias (e.g., live
) that will represent the production version of your function. Initially, you can point this alias to Version 1.
When you're ready to start the canary release, you update the alias to use weighted routing. You can configure it to send, for example, 90% of its traffic to Version 1 and 10% to your new Version 2.
Step 3: Create a Canary Deployment in API Gateway
Now, you need to tell API Gateway to use this traffic-splitting alias.
Point to the Alias: In your API Gateway integration settings, make sure you are invoking the Lambda function using its alias ARN (e.g.,
...:function:my-function:live
), not the unqualified function ARN.Create a New Deployment: Deploy your API changes to your production stage (e.g.,
prod
).Enable the Canary: In the stage's settings, go to the "Canary" tab and create a new canary.
- Percentage of Traffic: Set the percentage of traffic you want to send to the canary. Let's start with 10%.
- This creates a special canary deployment that runs alongside your main production deployment.
How It Works
With this setup:
- 90% of requests to your
prod
stage will be routed to the main deployment, which invokes thelive
alias, sending traffic to Version 1 of your Lambda. - 10% of requests will be routed to the canary deployment. Here, you can configure the
live
alias to point to Version 2.
This allows you to test the new version with a small amount of live production traffic.
Step 4: Monitor and Promote
During the canary release, it's critical to monitor your application's health.
- CloudWatch Metrics: Watch the CloudWatch metrics for your new Lambda version (Version 2). Look for an increase in the
Errors
metric or unusualDuration
spikes. - CloudWatch Alarms: Set up CloudWatch alarms that will automatically trigger if the error rate for Version 2 exceeds a certain threshold. This allows for an automated rollback.
If the new version is performing well and no errors are detected, you can gradually increase the traffic to it. In the API Gateway canary settings, you can increase the percentage to 25%, then 50%, and so on.
Once you are confident that the new version is stable, you can promote the canary. This action copies the canary deployment to the main production stage, sending 100% of traffic to the new version. Your deployment is now complete.
Conclusion
Canary deployments are a powerful technique for reducing the risk of production releases. By gradually rolling out changes and monitoring their impact, you can catch bugs before they affect all of your users.
While setting up a canary release for the first time involves a few extra steps, the safety and confidence it provides are invaluable. For any business-critical serverless API, implementing a progressive deployment strategy like canary releases is not just a best practice—it's essential for maintaining a reliable and high-quality service.