AWS Auto Scaling

Automatically adjusts the number of EC2 instances, ECS tasks, and other resources based on demand to balance availability and cost efficiency

Overview

AWS Auto Scaling automatically scales compute resources out (increase) and in (decrease) based on application load. While EC2 Auto Scaling is the most widely used, it also supports ECS services, DynamoDB tables, Aurora replicas, and Lambda concurrency. By combining target tracking scaling (based on metrics like CPU utilization), scheduled scaling, and predictive scaling, you can handle both traffic spikes and avoid over-provisioning.

How to Choose Among Three Scaling Policies

The optimal Auto Scaling policy depends on your traffic pattern. Target tracking scaling is the simplest: you specify a target like 'maintain CPU at 60%' and Auto Scaling adjusts instance count automatically. It works well for most workloads but may lag behind sudden spikes. Step scaling lets you define different scaling actions at different metric thresholds, providing finer control for gradual load increases but with more complex configuration. Predictive scaling uses machine learning to analyze the past 14 days of metric patterns and proactively scales out before demand arrives, making it ideal for workloads with predictable daily traffic patterns. In practice, combining target tracking for baseline responsiveness with predictive scaling for anticipated peaks delivers the best balance of cost and availability.

Mixed Fleet Configuration with Spot Instances

One of Auto Scaling's most powerful cost optimization features is mixed instance policies that combine On-Demand and Spot Instances. You assign Reserved or On-Demand Instances for baseline capacity and let Auto Scaling add Spot Instances for variable demand, achieving up to 90% cost savings on the Spot portion. To mitigate Spot interruptions, specify multiple instance types (e.g., m5.large, m5a.large, m5d.large, m4.large) and spread across all available Availability Zones - Auto Scaling automatically selects the most available and cost-effective combination. The capacity-optimized allocation strategy prioritizes instance pools with the highest availability, reducing interruption frequency. Azure Virtual Machine Scale Sets (VMSS) also support Spot VMs, but EC2 Auto Scaling's native mixed fleet configuration with per-instance-type weighting offers more granular control over the On-Demand to Spot ratio. AWS infrastructure books on Amazon cover this in depth.

Health Check and Warm-Up Pitfalls

The most critical aspect of Auto Scaling in practice is properly configuring health checks and cooldown periods. By default, Auto Scaling uses EC2 status checks, which only detect hardware-level failures. Enabling ELB health checks adds application-level detection, automatically replacing instances that fail to respond on the health check endpoint. However, you must set a sufficient Health Check Grace Period (typically 300 seconds or more) to prevent newly launched instances from being terminated before they finish booting and warming up. Without this grace period, instances running slow initialization scripts or loading large caches get flagged as unhealthy and replaced in an endless loop. The default cooldown period (300 seconds) prevents Auto Scaling from launching or terminating additional instances too quickly after a scaling activity, but for workloads with rapid traffic changes, you may need to shorten this value. Instance warm-up time, configured at the scaling policy level, tells Auto Scaling to exclude recently launched instances from metric calculations until they are fully ready, preventing premature scale-in decisions.

共有するXB!