How AWS API Throttling Works - The Token Bucket Algorithm and the Truth Behind 429 Errors

Learn how AWS API rate limiting is implemented using the token bucket algorithm, understand the concept of burst capacity, explore differences in throttling limits across services, and discover practical strategies to avoid throttling.

約 6 分で読めます最終更新: 2025-09-30

Why AWS Applies Rate Limits to Every API

Every AWS API has per-account, per-region rate limits (throttling). Requests that exceed these limits receive HTTP 429 (Too Many Requests) or 503 (Service Unavailable) errors. Rate limiting serves two purposes. First, it ensures fairness in a multi-tenant environment. If one account makes massive API calls, it can affect the performance of other accounts sharing the same infrastructure. Rate limits are guardrails that prevent the "noisy neighbor problem." Second, it protects customers themselves. Application bugs can cause infinite loops that call APIs tens of thousands of times per second. Without rate limits, such runaway behavior would lead to enormous bills. Rate limits function as a safety net for detecting unintended runaway behavior early. Each service's rate limits can be checked in Service Quotas, and many limits can be raised through limit increase requests.

How the Token Bucket Algorithm Works

AWS API throttling is implemented using the token bucket algorithm. This algorithm works by replenishing tokens (permits) into a bucket (container) at a constant rate, with each API request consuming one token. When the bucket is empty, requests are rejected. Here's a concrete example. Suppose the EC2 DescribeInstances API has a rate limit of 100 requests per second with a burst capacity of 200. The bucket is replenished with 100 tokens per second, and the maximum bucket capacity is 200 tokens. Under normal conditions, the bucket is full (200 tokens), so you can send 200 requests instantaneously (burst). After that, you can only process requests at a pace of 100 per second. Burst capacity is a buffer that absorbs short-term spikes. In patterns where APIs are called simultaneously at application startup, burst capacity becomes critical. After the burst is exhausted, you are limited to the steady-state rate (100 requests per second).

Throttling Granularity Varies by Service

Throttling granularity varies significantly across services. EC2 APIs have individual rate limits set per API action. DescribeInstances and RunInstances are managed in separate buckets, so throttling on DescribeInstances does not affect RunInstances. DynamoDB throttling, on the other hand, is applied at the table level. Requests exceeding a table's provisioned capacity (RCU/WCU) are throttled. This is a data access throughput limit, different from API-level throttling. Lambda's concurrent execution limit is also a form of throttling. Function invocations exceeding the account's default concurrent execution limit of 1,000 are throttled with 429 errors. API Gateway has an account-level rate limit of 10,000 requests per second (default), with additional throttling settings configurable per API, per stage, and per method. This multi-layered throttling ensures that concentrated access to a specific API endpoint does not affect other endpoints.

Exponential Backoff and Jitter - The Retry Strategy SDKs Handle Automatically

The correct response to a throttling error (429) is to retry with a combination of exponential backoff and jitter. Exponential backoff is a strategy that increases retry intervals exponentially: 1 second, 2 seconds, 4 seconds, 8 seconds, and so on. This gradually reduces request pressure on the throttled service. Jitter adds random variation to retry intervals. With exponential backoff alone, multiple clients throttled simultaneously would retry at the same time, causing throttling again in a "thundering herd" problem. Adding jitter distributes the timing of retries. AWS SDKs automatically implement this retry strategy internally. The default for SDK v3 (JavaScript) is a maximum of 3 retries with exponential backoff and full jitter applied. boto3 (Python) has a similar retry strategy. If you call APIs directly without using an SDK, you need to implement this retry logic yourself.

Design Patterns to Proactively Avoid Throttling

Rather than retrying after throttling occurs, the ideal approach is to design systems that prevent throttling in the first place. The first pattern is reducing API calls. Instead of calling EC2's DescribeInstances every second to monitor instance state, you can use EventBridge events (EC2 Instance State-change Notification) to receive notifications only when state changes occur. Shifting from polling to event-driven architecture dramatically reduces API call volume. The second pattern is leveraging caching. Information that doesn't change frequently (account settings, region lists, etc.) can be cached locally to reduce API calls. The third pattern is using batch APIs. DynamoDB's BatchGetItem can retrieve up to 100 items in a single API call. Compared to calling GetItem 100 times individually, this reduces API call count by 99%. S3's ListObjectsV2 can also retrieve up to 1,000 objects per request using the MaxKeys parameter. To systematically learn about API design and throttling strategies, specialized books (Amazon) can be helpful.

How AI-DLC Transforms Software Development - A Practical Guide to the Inception, Construction, and Operation PhasesAn overview of the AI-DLC methodology that places AI at the center of the development process. Learn about the three phases of Inception, Construction, and Operation, and how to put them into practice with Kiro and Amazon Q Developer.Hands-On with the AI-DLC Unicorn Gym Workshop - Learning AI-DLC Through Team DevelopmentA complete look at the hands-on workshop where you experience AI-DLC through three days of team development. Learn how to run Mob Elaboration and Mob Construction, and how to leverage open-source workflows.Visually Design Serverless Applications with AWS Application Composer - Automatic IaC Template GenerationLearn about visual design of serverless architectures with Application Composer, automatic SAM template generation, and VS Code integration.Artifact Repository Management - Building a Secure Package Management Platform with AWS CodeArtifactLearn how to build and operate an artifact repository using AWS CodeArtifact. This guide covers centralizing package management for npm, Maven, PyPI, and more, along with building secure build pipelines through CodeBuild integration.Why AWS APIs Return XML - The Evolution from Query APIs to REST JSONExplore the historical reasons why S3 and EC2 APIs return XML responses, the differences between Query APIs and REST APIs, the evolution of authentication from Signature V2 to V4, and the complexity that SDKs abstract away.Why AWS API Endpoints Differ by Region - The Design of Global and Regional ServicesWe explain the design reasons behind the separation of regional endpoints like ec2.us-east-1.amazonaws.com and global endpoints like iam.amazonaws.com, as well as the existence of dual-stack and FIPS endpoints.Browser-Based Shell Environment - Instant CLI Access with AWS CloudShellExplains the browser-based shell environment powered by AWS CloudShell. Covers the CLI environment available instantly from the AWS Management Console, pre-installed development tools, automatic IAM authentication integration, secure file management, and practical tips for improving operational efficiency.IaC with Programming Languages Using AWS CDK - Designing Constructs and StacksLearn about defining infrastructure with TypeScript/Python using CDK, choosing between L1/L2/L3 constructs, and testing techniques.

Why AWS Applies Rate Limits to Every API

How the Token Bucket Algorithm Works

Throttling Granularity Varies by Service

Exponential Backoff and Jitter - The Retry Strategy SDKs Handle Automatically

Design Patterns to Proactively Avoid Throttling

Related Services

Related Articles

More on This Topic

Similar Articles and Services