Behind Lambda's 1,000 Concurrent Executions - How Firecracker Warm Pools and Worker Managers Work

This article explains how Lambda manages MicroVMs within the default concurrency limit of 1,000, the warm pool reuse strategy, worker manager placement decisions, and internal optimizations that reduce cold start rates.

The Lifecycle of Lambda Execution Environments

When a Lambda function invocation arrives, the Lambda service assigns an execution environment. An execution environment is a sandbox running on a Firecracker MicroVM that contains the function's code, runtime, and configured environment variables. The execution environment lifecycle consists of three phases. The INIT phase starts the runtime and executes global scope code outside the handler. The INVOKE phase calls the handler function. The SHUTDOWN phase runs runtime termination processing. The important point is that after the INVOKE phase completes, the execution environment is not immediately destroyed but is kept in a "frozen" state for a period of time. When the next request arrives, the frozen execution environment is "thawed" and reused. This is a warm start. A frozen execution environment consumes no CPU time and only occupies memory. Lambda manages a pool of these frozen execution environments, which is the "warm pool."

Warm Pool Management Strategy

Lambda's warm pool management is an optimization problem that balances minimizing cold start rates with maximizing resource efficiency. Keeping execution environments in the warm pool for extended periods reduces cold starts but continues to occupy memory, putting pressure on physical host capacity. Conversely, destroying them quickly frees resources but increases cold starts. In a 2019 paper, AWS suggested that Lambda uses machine learning-based prediction models for warm pool management. It analyzes function invocation patterns (frequency, time of day, burst characteristics) and preferentially retains execution environments for functions with a high probability of receiving the next request. Execution environments for frequently invoked functions are retained longer, while those for rarely invoked functions are destroyed earlier. The warm pool retention time is not officially documented, but measurements report approximately 5-15 minutes. However, this duration varies dynamically based on function invocation patterns and physical host capacity conditions.

Worker Manager - Deciding Which Physical Host to Use

Lambda's worker manager is the component that decides which physical host (worker) to place a new execution environment on. This placement decision must satisfy multiple constraints simultaneously. First, the physical host must have available capacity (CPU, memory). Second, execution environments for the same customer must be distributed across multiple physical hosts (fault isolation). Third, for functions requiring VPC connectivity, an ENI must be available in the target VPC's subnet. Before 2019, VPC-connected Lambda functions needed to create an ENI at startup, adding 10-30 seconds of delay to cold starts. A 2019 improvement changed Lambda to pre-provision VPC ENIs and use Hyperplane (AWS's internal network virtualization layer) to dynamically map execution environments to ENIs. This improvement brought VPC-connected Lambda function cold starts to nearly the same level as non-VPC functions.

Concurrency Management and Burst Limits

Lambda's default concurrent execution limit of 1,000 per account is shared across all functions in a region. If Function A uses 800 concurrent executions, Function B can only use 200. The burst limit restricts how quickly concurrent executions can increase. In us-east-1, us-west-2, and eu-west-1, the initial burst is 3,000, followed by an increase of 500 per minute. Other regions have initial bursts of 500-1,000. This means concurrent executions can jump instantly from 0 to 3,000, but going from 3,000 to 10,000 takes about 14 minutes. This burst limit is designed to give Lambda's internal infrastructure (Firecracker MicroVM startup, ENI provisioning, physical host capacity allocation) time to handle rapid load increases. Using Provisioned Concurrency lets you keep a specified number of execution environments pre-warmed regardless of burst limits.

Cautions When Reusing Execution Environments

Execution environment reuse (warm starts) is advantageous from a performance perspective, but developers should be aware of certain side effects. First, global variable state is preserved. Values set in global variables during the previous invocation persist in the next invocation. Using this to cache database connections (connection pooling) is a recommended pattern, but storing request-specific data in global variables creates bugs where the previous request's data leaks into the next request. Second, files in the /tmp directory are preserved. Files written to /tmp during the previous invocation remain, potentially exhausting disk capacity (up to 10GB). Third, background processes may persist. If you start an asynchronous process within the handler function and return a response without waiting for completion, that asynchronous process may still be running when the next invocation occurs. For a deep understanding of Lambda's internal workings, specialized books on Amazon are a great resource.