Last year, we built a cloud-native app with an Angular frontend, a Lambda backend & an RDS DB. When it was time to decide what language to write our Lambdas in, we immediately looked for Java, as that was our dev team’s primary skillset. Little did we know, what we were getting into.
Soon after our Angular app started making calls to the Lambdas, we knew, something was very wrong. The APIs were too slow to respond, taking up to 10s of seconds sometimes. Timeouts were all too frequent because API Gateway has a max timeout of 29 seconds, after which it would send a timeout response to the caller, even though the Lambda that it invoked was still running.
This began our month-long research into why we were facing such severe performance issues. The first thing we stumbled across, was cold starts. And so we did what everyone else did: implement warmup for our Lambdas. This seemed to address the issue a bit but timeouts were still too common. Obviously we couldn’t just say to the user: “Hey, that timeout was just the Lambda warming up. It’s warm now. Go ahead & click that button again.” So we had to keep digging…
We soon enlisted the help of the AWS support team & they pointed out what we were doing wrong:
Lambda Chaining: Cold Starts Multiplied
Due to our lack of experience with serverless, we had followed traditional software best practices of modularizing code as much as possible, to ease & promote code reuse, maintainability, etc. With that mindset, we created literally 100s of Lambda-based nano services. Every incoming API call would first invoke a Lambda, which in turn would call up to 7 other Lambdas, in sequence or in parallel, to get the job done. If each one of those functions takes even a few seconds to cold start, the total response time would degrade drastically. No wonder we were so used to the timeouts by now.
Lambdas Inside VPC: Bad Idea
All our data being in RDS, it was a no-brainer for us to move all Lambda functions inside a VPC so they could all access the private database. It was only when AWS support explained, did we realize how severely moving Lambdas inside VPC degrades cold start times.
If you’re outside a VPC, a cold start is just a matter of spinning up a container & loading the language runtime in it. Even for Java, this would never be more than 2-3 seconds. But when in a VPC, a cold start means spinning up an EC2 instance in the VPC, starting a container in it & then loading the language runtime. The critical step here is that once the instance is ready, an ENI needs to be allocated & assigned to it before anything else can happen. This ENI step alone can take up to a whopping TEN seconds!
The Way Out
Once we knew our mistakes, it was pretty straight-forward how to fix them:
- First & more important, if you can keep your Lambdas outside a VPC, DO IT! Otherwise, none of the steps below can get you out of the ENI jam.
- DON’T CHAIN LAMBDAS unless you absolutely have to. In most cases, it’s easy to encapsulate all functionality of an API in a single Lambda & still keep your codebase manageable.
- Pick Another Language: If you’re just starting out, you’re better off with Node.JS, Python or Go (as recommended by AWS). These have a much lower init time than Java.
- In general, the higher a Lambda function’s memory allocation, the more cold starts you’ll experience.
- Lambda layers won’t improve performance.
Apart from these, if you’re still seeing poor performance, integrate X-Ray into your functions to see exactly where the time is being spent. Here’s a link to the Lambda best practices for a lot more tips & tricks to get the fastest Lambdas.