Behind Lambda's 1,000 Concurrent Executions - How Firecracker Warm Pools and Worker Managers Work

This article explains how Lambda manages MicroVMs within the default concurrency limit of 1,000, the warm pool reuse strategy, worker manager placement decisions, and internal optimizations that reduce cold start rates.

約 6 分で読めます最終更新: 2025-10-05

The Lifecycle of Lambda Execution Environments

When a Lambda function invocation arrives, the Lambda service assigns an execution environment. An execution environment is a sandbox running on a Firecracker MicroVM that contains the function's code, runtime, and configured environment variables. The execution environment lifecycle consists of three phases. The INIT phase starts the runtime and executes global scope code outside the handler. The INVOKE phase calls the handler function. The SHUTDOWN phase runs runtime termination processing. The important point is that after the INVOKE phase completes, the execution environment is not immediately destroyed but is kept in a "frozen" state for a period of time. When the next request arrives, the frozen execution environment is "thawed" and reused. This is a warm start. A frozen execution environment consumes no CPU time and only occupies memory. Lambda manages a pool of these frozen execution environments, which is the "warm pool."

Warm Pool Management Strategy

Lambda's warm pool management is an optimization problem that balances minimizing cold start rates with maximizing resource efficiency. Keeping execution environments in the warm pool for extended periods reduces cold starts but continues to occupy memory, putting pressure on physical host capacity. Conversely, destroying them quickly frees resources but increases cold starts. In a 2019 paper, AWS suggested that Lambda uses machine learning-based prediction models for warm pool management. It analyzes function invocation patterns (frequency, time of day, burst characteristics) and preferentially retains execution environments for functions with a high probability of receiving the next request. Execution environments for frequently invoked functions are retained longer, while those for rarely invoked functions are destroyed earlier. The warm pool retention time is not officially documented, but measurements report approximately 5-15 minutes. However, this duration varies dynamically based on function invocation patterns and physical host capacity conditions.

Worker Manager - Deciding Which Physical Host to Use

Lambda's worker manager is the component that decides which physical host (worker) to place a new execution environment on. This placement decision must satisfy multiple constraints simultaneously. First, the physical host must have available capacity (CPU, memory). Second, execution environments for the same customer must be distributed across multiple physical hosts (fault isolation). Third, for functions requiring VPC connectivity, an ENI must be available in the target VPC's subnet. Before 2019, VPC-connected Lambda functions needed to create an ENI at startup, adding 10-30 seconds of delay to cold starts. A 2019 improvement changed Lambda to pre-provision VPC ENIs and use Hyperplane (AWS's internal network virtualization layer) to dynamically map execution environments to ENIs. This improvement brought VPC-connected Lambda function cold starts to nearly the same level as non-VPC functions.

Concurrency Management and Burst Limits

Lambda's default concurrent execution limit of 1,000 per account is shared across all functions in a region. If Function A uses 800 concurrent executions, Function B can only use 200. The burst limit restricts how quickly concurrent executions can increase. In us-east-1, us-west-2, and eu-west-1, the initial burst is 3,000, followed by an increase of 500 per minute. Other regions have initial bursts of 500-1,000. This means concurrent executions can jump instantly from 0 to 3,000, but going from 3,000 to 10,000 takes about 14 minutes. This burst limit is designed to give Lambda's internal infrastructure (Firecracker MicroVM startup, ENI provisioning, physical host capacity allocation) time to handle rapid load increases. Using Provisioned Concurrency lets you keep a specified number of execution environments pre-warmed regardless of burst limits.

Cautions When Reusing Execution Environments

Execution environment reuse (warm starts) is advantageous from a performance perspective, but developers should be aware of certain side effects. First, global variable state is preserved. Values set in global variables during the previous invocation persist in the next invocation. Using this to cache database connections (connection pooling) is a recommended pattern, but storing request-specific data in global variables creates bugs where the previous request's data leaks into the next request. Second, files in the /tmp directory are preserved. Files written to /tmp during the previous invocation remain, potentially exhausting disk capacity (up to 10GB). Third, background processes may persist. If you start an asynchronous process within the handler function and return a response without waiting for completion, that asynchronous process may still be running when the next invocation occurs. For a deep understanding of Lambda's internal workings, specialized books on Amazon are a great resource.

Amazon API Gateway Design Patterns - Choosing Between REST API and HTTP APIClear criteria for choosing between HTTP API and REST API, authentication patterns with Cognito and Lambda authorizers, and practical throttling design techniques.Amazon EFS Integration with Lambda and ECS - Shared File Systems in Serverless ArchitecturesMount EFS on Lambda functions and ECS tasks to leverage shared file systems. Learn access point design and performance optimization techniques.Why Lambda's Limit Is 15 Minutes - The Rationale Behind Serverless Design ConstraintsThis article explains why Lambda's various limits - 15-minute maximum execution time, 10GB memory cap, 6MB payload - are set at those specific values, from the perspective of Firecracker's design philosophy and multi-tenant operations.Understanding Lambda Cold Starts and Measurement-Based Optimization StrategiesThis article explains the mechanism behind Lambda cold starts from the Firecracker MicroVM lifecycle perspective, and compares optimization techniques across three axes: SnapStart, Provisioned Concurrency, and function design, backed by measured data.Getting Started with Serverless Development Using AWS Lambda - Function Design and Event Source IntegrationThis article covers Lambda function design, event source mapping, cold start mitigation, and Powertools usage.Building Serverless APIs - Scalable API Infrastructure with Amazon API GatewayLearn how to build serverless APIs using Amazon API Gateway and Lambda.Designing State Machines - Workflow Orchestration with Step FunctionsLearn state machine design techniques with AWS Step Functions, including visual workflows, error handling, and workflow orchestration through Lambda integration.Designing Workflow Orchestration with AWS Step Functions - Choosing Between Standard and ExpressClarify the selection criteria between Standard and Express workflows, and learn about declarative error handling with Retry/Catch and large-scale parallel processing with distributed maps.

The Lifecycle of Lambda Execution Environments

Warm Pool Management Strategy

Worker Manager - Deciding Which Physical Host to Use

Concurrency Management and Burst Limits

Cautions When Reusing Execution Environments

Related Services

Related Articles

More on This Topic

Similar Articles and Services