Why Auto Scaling Scales Out Fast but Scales In Cautiously - The Design Intent Behind Asymmetric Decision Logic

This article explains why EC2 Auto Scaling executes scale-out immediately while applying a cooldown period for scale-in, the flapping prevention mechanism, and the internal logic of target tracking scaling.

約 6 分で読めます最終更新: 2025-09-25

The Asymmetry Between Scale-Out and Scale-In

In EC2 Auto Scaling's default settings, the scale-out (adding instances) cooldown period is 0 seconds (immediate execution), while the scale-in (removing instances) cooldown period is 300 seconds (5 minutes). This asymmetric configuration has a clear design intent. A delayed scale-out directly impacts users. If traffic surges but instances are not added, response times degrade, and in the worst case, the service goes down. Therefore, scale-out should execute as quickly as possible. On the other hand, if scale-in happens too quickly, flapping occurs - the frequent repetition of scaling out and scaling in. Traffic temporarily decreases, instances are removed, and then traffic increases again, requiring instances to be added back. Since instance startup takes several minutes, flapping causes both performance degradation and increased costs simultaneously.

How Cooldown Periods Work

A cooldown period is the time after a scaling action during which subsequent scaling actions are suppressed. During a scale-out cooldown, additional scale-outs are suppressed, but scale-in can still execute. Conversely, during a scale-in cooldown, additional scale-ins are suppressed, but scale-out can still execute. This design ensures that the situation of "scale-out is needed but cannot execute because we're in a scale-in cooldown" never occurs. The optimal cooldown value depends on workload characteristics. If it takes 3 minutes from EC2 instance launch to passing the ELB health check, the scale-out cooldown should be set to at least 3 minutes. If the cooldown is too short, the system judges "still not enough" while new instances have not yet started processing traffic, resulting in excessive scale-out. When using target tracking scaling policies, cooldown periods are managed automatically, so manual configuration is unnecessary.

Internal Logic of Target Tracking Scaling

Target tracking scaling is the most recommended scaling policy, automatically adjusting instance count to maintain a specified metric at a target value. For example, if you set the CPU utilization target to 50%, Auto Scaling increases or decreases instances to maintain CPU utilization at 50%. Internally, target tracking scaling operates with an algorithm similar to a PID controller (proportional-integral-derivative control). It calculates the required number of instances based on the difference (deviation) between the current metric value and the target value. The larger the deviation, the more instances are added or removed at once. A key characteristic of target tracking scaling is that it internally creates different alarms for scale-out and scale-in. The scale-out alarm fires when the threshold is exceeded 3 consecutive times over a 3-minute evaluation period, while the scale-in alarm fires when the metric falls below the threshold 15 consecutive times over a 15-minute evaluation period. This asymmetric evaluation period is what achieves "fast scale-out, cautious scale-in."

Instance Selection Logic During Scale-In

When scale-in executes, which instance gets terminated is determined by the default termination policy. The default logic has 3 stages. First, it selects the AZ with the most instances. This maintains instance count balance across AZs. Second, within that AZ, it selects the instance using the oldest launch configuration or launch template. This prioritizes removal of instances with older configurations, facilitating migration to newer configurations. Third, if multiple instances share the same launch configuration, it selects the instance closest to the next billing hour. Since EC2 introduced per-second billing, this criterion has little practical significance, but the logic remains. Custom termination policies are also available. You can choose policies such as NewestInstance (terminate the newest instance), OldestInstance (terminate the oldest instance), and ClosestToNextInstanceHour (terminate the instance closest to the next billing hour).

Predictive Scaling - Forecasting the Future from Past Patterns

Predictive scaling, introduced in 2021, uses machine learning to analyze the past 14 days of traffic patterns, predict future traffic, and pre-provision instances accordingly. For example, if traffic surges every morning at 9 AM, predictive scaling begins adding instances around 8:50 AM to prepare for the 9 AM spike. With reactive scaling (adding instances only after traffic increases), the several minutes required for instance startup and ELB registration cause performance degradation during the initial traffic surge. Predictive scaling bridges this gap. Predictive scaling is recommended for use alongside target tracking scaling. Predictive scaling pre-provisions the "expected baseline" while target tracking scaling handles "unexpected fluctuations" - a clear division of responsibilities. The accuracy of predictive scaling depends on the regularity of traffic patterns. Workloads that repeat the same pattern daily achieve high accuracy, but irregular traffic patterns may result in inaccurate predictions. For a systematic approach to scaling design patterns, specialized books on Amazon are a helpful reference.

Demand-Driven Infrastructure with AWS Auto Scaling - Designing and Optimizing Scaling PoliciesLearn how to use target tracking, predictive, and scheduled scaling policies effectively, and optimize costs with mixed instances policies that leverage Spot Instances.AWS Fault Domain Design - How the Three-Layer Structure of AZs, Regions, and Partitions Protects AvailabilityLearn why AWS infrastructure is designed with three layers of fault domains - AZs (fault isolation), Regions (geographic separation), and Partitions (political separation) - and how far failures propagate at each layer, with real-world examples.Distributed Systems Principles Learned from AWS Outages - How Past Major Incidents Reshaped ArchitectureUsing AWS's published incident reports as case studies - including the S3 outage (2017), Kinesis outage (2020), and the unique nature of us-east-1 - this article explains design principles such as Shuffle Sharding, Static Stability, and Cell-based Architecture.Why AWS Builds Regions Where It Does - The Hidden Criteria Behind Data Center Site SelectionWe explain the criteria AWS considers when deciding region locations, including power supply, geopolitical risk, data sovereignty legislation, network connectivity, and natural disaster risk, with concrete examples from specific regions.Why AWS Availability Zone IDs Differ Per Account - The Design Intent Behind AZ MappingExplains how us-east-1a maps to different physical AZs per account, why AZ IDs (use1-az1) were introduced, the design intent of even capacity distribution, and considerations for cross-account AZ specification.Batch Computing Infrastructure - Large-Scale Parallel Processing with AWS BatchLearn how to build large-scale batch processing with AWS Batch. Covers job queue design, auto-scaling compute environments, cost optimization with Spot Instances, and building batch infrastructure ideal for scientific computing and large-scale data processing.Automating Batch Computing with AWS Batch - Designing Job Queues and Compute EnvironmentsLearn about job scheduling with AWS Batch, choosing between Fargate and EC2 compute environments, and leveraging Spot Instances for cost optimization.Large-Scale Batch Processing with AWS Batch - Job Queue Design and Cost OptimizationLearn how to design job queue priorities, choose between Fargate and EC2 compute environments, and build complex computational pipelines using array jobs and job dependencies.

The Asymmetry Between Scale-Out and Scale-In

How Cooldown Periods Work

Internal Logic of Target Tracking Scaling

Instance Selection Logic During Scale-In

Predictive Scaling - Forecasting the Future from Past Patterns

Related Services

Related Articles

More on This Topic

Similar Articles and Services