Why AWS Service Quotas Exist - Multi-Tenant Design That Protects Shared Infrastructure

Explain how AWS service quotas (formerly service limits) are not mere restrictions but a design to protect other customers in a multi-tenant environment, covering the noisy neighbor problem, soft vs hard limits, and what happens behind quota increase requests.

About 6 min readLast updated: 2025-10-22

The Essence of Service Quotas - Protecting Other Customers

AWS service quotas (formerly known as service limits) are upper bounds on the resources each account can use. Quotas are set on nearly every resource: the number of EC2 instances that can be launched, Lambda concurrent executions, S3 bucket count, VPC count, and more. These quotas don't exist to inconvenience users. AWS infrastructure is multi-tenant. Physical servers, networks, and storage are shared among multiple customers. If a single account consumes resources without limit, the performance of other customers sharing the same physical infrastructure degrades. This is the "noisy neighbor problem." Service quotas prevent the noisy neighbor problem by capping each tenant's resource consumption. Control plane (API) quotas also exist. The EC2 DescribeInstances API has a limit of 100 requests per second. If a single account floods the API with calls, the control plane's processing capacity is strained, causing API call delays for other accounts.

Soft Limits and Hard Limits

There are two types of quotas. Soft limits (adjustable quotas) can be raised through the Service Quotas console or by requesting through AWS Support. Examples include EC2 on-demand instance vCPU counts (defaults vary by instance family per region), Lambda concurrent executions (default: 1,000), and S3 bucket count (default: 100). Hard limits (non-adjustable quotas) are design constraints of the service and cannot be raised. Examples include IAM policy size limit (6,144 characters), CloudFormation resources per stack (500), and S3 maximum object size (5 TB). Hard limits stem from the service's internal architecture. The IAM policy size limit is set to guarantee policy evaluation performance. If policies are too large, the latency of policy evaluation performed on every API call increases, affecting all API calls.

Behind the Scenes of Quota Increase Requests

What happens when you submit a quota increase request from the Service Quotas console? Some quotas are auto-approved. Increasing the S3 bucket count (100 to 1,000), for example, is automatically approved within minutes. These quotas are automated because raising them has minimal impact on other customers. On the other hand, a significant increase in EC2 vCPU quotas (e.g., 1,000 to 10,000) requires manual review by AWS's capacity team. The review checks whether the requested region has sufficient physical capacity and whether the requester's account usage history justifies the requested amount. If a new account suddenly requests massive resources, it may be rejected on suspicion of fraudulent use (such as cryptocurrency mining). Building up usage history and requesting increases gradually is the reliable approach. Processing time for quota increases ranges from minutes for auto-approval to hours or days for manual review. Submit requests well in advance of production launches.

Why Default Quotas Are Low

Default quotas for new accounts are intentionally set low. EC2 on-demand vCPUs for new accounts are typically around 5-32 vCPUs per instance family. There are three reasons for these low defaults. First, fraud prevention. Fraudulent use where accounts created with stolen credit cards launch massive EC2 instances for cryptocurrency mining is a serious problem for AWS. Low default quotas minimize the damage from such abuse. Second, preventing unintended high bills. There are cases where misconfigured Auto Scaling runs away, launching thousands of instances. Without quotas, bills of tens of thousands of dollars could accumulate in hours. Third, capacity planning. AWS manages the physical capacity of each region. Capacity is planned so that even if all accounts simultaneously use resources up to their maximum quotas, the infrastructure can handle it. Lower default quotas make capacity estimation easier.

Quota Monitoring and Automation

Service Quotas integrates with CloudWatch, allowing quota utilization to be retrieved as metrics. For example, you can configure a CloudWatch alarm to fire when EC2 vCPU quota utilization exceeds 80% and send notifications via SNS. Since it's too late once you hit the quota, proactive monitoring and timely increase requests are essential. Trusted Advisor also monitors quota utilization. With Business Support plans and above, Trusted Advisor automatically detects services where quota utilization exceeds 80% and displays them on the dashboard. When using AWS Organizations, Service Quotas' quota request templates can automatically submit quota increase requests when new accounts are created. Applying unified quotas across all accounts in the organization eliminates the need for manual per-account requests. To systematically learn about quota management, specialized books (Amazon) can be helpful.

How AWS Keeps Time Internally - Amazon Time Sync Service and Leap Second Smearing DesignLearn how Amazon Time Sync Service works, how GPS and atomic clocks provide high-precision time sources, the design decision to absorb leap seconds through smearing, and why time synchronization matters in distributed systems.Centralizing SaaS Audit Logs with AWS AppFabric - OCSF Standardization and Security Lake IntegrationLearn how AppFabric collects audit logs from SaaS applications, standardizes them to OCSF format, and builds analysis pipelines.Implementing Feature Flags with AWS AppConfig - Safe Configuration Deployment and RollbackRoll out configuration changes independently from code deployments using Linear and Exponential strategies. Ensure safety with automatic rollback triggered by CloudWatch alarms.Architecture Review - Systematically Evaluate Workloads with the AWS Well-Architected ToolLearn about architecture reviews using the AWS Well-Architected Tool. Covers evaluation based on the six pillars, improvement planning, and custom lens usage.Audit Log Design and Operations - Complete API Activity Recording with CloudTrailLearn how to design audit logs using AWS CloudTrail, including recording API activity, long-term storage in S3, and compliance automation through integration with AWS Config.Lessons from AWS Incident Reports (COE) - How Past Major Outages Shaped Design PrinciplesAnalyze the root causes of past major incidents including the S3 outage, us-east-1 DNS failure, and Kinesis outage from AWS's published Correction of Errors (COE) and incident reports, and explain how they changed AWS's design principles.Tag Design Determines Operations - Trivia and Practical Naming Conventions for AWS Resource Tagging StrategyWe explain why AWS resource tags are not just labels but the foundation for cost allocation, access control, and automation, covering tag key naming conventions, how to use the 50-tag limit, and governance through tag policies.Centralized Backup Management with AWS Backup - Backup Plans and Cross-Region ProtectionManage backups for EC2, RDS, DynamoDB, and more under a unified policy. Covers Vault Lock WORM protection and automated restore testing.

The Essence of Service Quotas - Protecting Other Customers

Soft Limits and Hard Limits

Behind the Scenes of Quota Increase Requests

Why Default Quotas Are Low

Quota Monitoring and Automation

Related Services

Related Articles

More on This Topic

Similar Articles and Services