Demand-Driven Infrastructure with AWS Auto Scaling - Designing and Optimizing Scaling Policies
Learn how to use target tracking, predictive, and scheduled scaling policies effectively, and optimize costs with mixed instances policies that leverage Spot Instances.
Overview of Auto Scaling
Auto Scaling is a service that automatically scales resources based on demand. It adds instances when traffic increases and removes them when traffic decreases. This prevents both cost waste from over-provisioning and performance degradation from under-provisioning. It offers three types of scaling policies - target tracking, step, and predictive - which you can use according to your workload characteristics.
Designing Scaling Policies
Target tracking scaling is the most recommended policy. Simply set a target value such as 70% CPU utilization or 1,000 ALB requests per minute, and Auto Scaling automatically adjusts capacity. Predictive scaling uses ML to analyze the past 14 days of traffic patterns and pre-provisions capacity based on predicted future demand. For a pattern where traffic surges every morning at 9 AM, it begins scaling out at 8:50 AM. Warm pools keep instances pre-initialized from an AMI with application startup completed, ready to be placed into service immediately when scale-out occurs.
Predictive Scaling and Scheduled Scaling
Predictive scaling uses machine learning to analyze the past 14 days of metric patterns, predicting future demand and executing scaling actions in advance. It complements the reaction delay of target tracking policies (several minutes from metric collection to instance startup completion), enabling response to sudden traffic spikes. Scheduled scaling pre-provisions capacity for predictable demand changes, such as before daily business hours begin or at sale start times. Combining predictive and scheduled scaling is effective: predictive scaling covers regular patterns while scheduled scaling handles event-driven demand. To gain a deeper understanding of scaling design and implementation, specialized books on Amazon are a useful resource.
Cost Optimization with Auto Scaling
Using Spot Instances with mixed instances policies in Auto Scaling groups can achieve up to 90% cost savings compared to On-Demand pricing. Specify multiple instance types and use the capacity-optimized allocation strategy to distribute Spot interruption risk. A configuration that secures minimum capacity with On-Demand and covers excess demand with Spot provides an excellent balance of stability and cost. Setting up a warm pool keeps pre-initialized instances in a pool, reducing startup time during scale-out. Use CloudWatch custom metrics (queue depth, active connections) in scaling policies to achieve more precise scaling that does not rely solely on CPU utilization.
Summary
Auto Scaling builds demand-driven infrastructure using three types of scaling policies: target tracking, step, and predictive. Predictive scaling proactively handles sudden traffic increases, while mixed instances policies leverage Spot Instances for cost optimization. Warm pools reduce startup time, achieving both high availability and cost efficiency.