Why AWS Service Quotas Exist - Multi-Tenant Design That Protects Shared Infrastructure

Explain how AWS service quotas (formerly service limits) are not mere restrictions but a design to protect other customers in a multi-tenant environment, covering the noisy neighbor problem, soft vs hard limits, and what happens behind quota increase requests.

The Essence of Service Quotas - Protecting Other Customers

AWS service quotas (formerly known as service limits) are upper bounds on the resources each account can use. Quotas are set on nearly every resource: the number of EC2 instances that can be launched, Lambda concurrent executions, S3 bucket count, VPC count, and more. These quotas don't exist to inconvenience users. AWS infrastructure is multi-tenant. Physical servers, networks, and storage are shared among multiple customers. If a single account consumes resources without limit, the performance of other customers sharing the same physical infrastructure degrades. This is the "noisy neighbor problem." Service quotas prevent the noisy neighbor problem by capping each tenant's resource consumption. Control plane (API) quotas also exist. The EC2 DescribeInstances API has a limit of 100 requests per second. If a single account floods the API with calls, the control plane's processing capacity is strained, causing API call delays for other accounts.

Soft Limits and Hard Limits

There are two types of quotas. Soft limits (adjustable quotas) can be raised through the Service Quotas console or by requesting through AWS Support. Examples include EC2 on-demand instance vCPU counts (defaults vary by instance family per region), Lambda concurrent executions (default: 1,000), and S3 bucket count (default: 100). Hard limits (non-adjustable quotas) are design constraints of the service and cannot be raised. Examples include IAM policy size limit (6,144 characters), CloudFormation resources per stack (500), and S3 maximum object size (5 TB). Hard limits stem from the service's internal architecture. The IAM policy size limit is set to guarantee policy evaluation performance. If policies are too large, the latency of policy evaluation performed on every API call increases, affecting all API calls.

Behind the Scenes of Quota Increase Requests

What happens when you submit a quota increase request from the Service Quotas console? Some quotas are auto-approved. Increasing the S3 bucket count (100 to 1,000), for example, is automatically approved within minutes. These quotas are automated because raising them has minimal impact on other customers. On the other hand, a significant increase in EC2 vCPU quotas (e.g., 1,000 to 10,000) requires manual review by AWS's capacity team. The review checks whether the requested region has sufficient physical capacity and whether the requester's account usage history justifies the requested amount. If a new account suddenly requests massive resources, it may be rejected on suspicion of fraudulent use (such as cryptocurrency mining). Building up usage history and requesting increases gradually is the reliable approach. Processing time for quota increases ranges from minutes for auto-approval to hours or days for manual review. Submit requests well in advance of production launches.

Why Default Quotas Are Low

Default quotas for new accounts are intentionally set low. EC2 on-demand vCPUs for new accounts are typically around 5-32 vCPUs per instance family. There are three reasons for these low defaults. First, fraud prevention. Fraudulent use where accounts created with stolen credit cards launch massive EC2 instances for cryptocurrency mining is a serious problem for AWS. Low default quotas minimize the damage from such abuse. Second, preventing unintended high bills. There are cases where misconfigured Auto Scaling runs away, launching thousands of instances. Without quotas, bills of tens of thousands of dollars could accumulate in hours. Third, capacity planning. AWS manages the physical capacity of each region. Capacity is planned so that even if all accounts simultaneously use resources up to their maximum quotas, the infrastructure can handle it. Lower default quotas make capacity estimation easier.

Quota Monitoring and Automation

Service Quotas integrates with CloudWatch, allowing quota utilization to be retrieved as metrics. For example, you can configure a CloudWatch alarm to fire when EC2 vCPU quota utilization exceeds 80% and send notifications via SNS. Since it's too late once you hit the quota, proactive monitoring and timely increase requests are essential. Trusted Advisor also monitors quota utilization. With Business Support plans and above, Trusted Advisor automatically detects services where quota utilization exceeds 80% and displays them on the dashboard. When using AWS Organizations, Service Quotas' quota request templates can automatically submit quota increase requests when new accounts are created. Applying unified quotas across all accounts in the organization eliminates the need for manual per-account requests. To systematically learn about quota management, specialized books (Amazon) can be helpful.