Demand-Driven Infrastructure with AWS Auto Scaling - Designing and Optimizing Scaling Policies

Learn how to use target tracking, predictive, and scheduled scaling policies effectively, and optimize costs with mixed instances policies that leverage Spot Instances.

約 3 分で読めます最終更新: 2026-01-21

Overview of Auto Scaling

Auto Scaling is a service that automatically scales resources based on demand. It adds instances when traffic increases and removes them when traffic decreases. This prevents both cost waste from over-provisioning and performance degradation from under-provisioning. It offers three types of scaling policies - target tracking, step, and predictive - which you can use according to your workload characteristics.

Designing Scaling Policies

Target tracking scaling is the most recommended policy. Simply set a target value such as 70% CPU utilization or 1,000 ALB requests per minute, and Auto Scaling automatically adjusts capacity. Predictive scaling uses ML to analyze the past 14 days of traffic patterns and pre-provisions capacity based on predicted future demand. For a pattern where traffic surges every morning at 9 AM, it begins scaling out at 8:50 AM. Warm pools keep instances pre-initialized from an AMI with application startup completed, ready to be placed into service immediately when scale-out occurs.

Predictive Scaling and Scheduled Scaling

Predictive scaling uses machine learning to analyze the past 14 days of metric patterns, predicting future demand and executing scaling actions in advance. It complements the reaction delay of target tracking policies (several minutes from metric collection to instance startup completion), enabling response to sudden traffic spikes. Scheduled scaling pre-provisions capacity for predictable demand changes, such as before daily business hours begin or at sale start times. Combining predictive and scheduled scaling is effective: predictive scaling covers regular patterns while scheduled scaling handles event-driven demand. To gain a deeper understanding of scaling design and implementation, specialized books on Amazon are a useful resource.

Cost Optimization with Auto Scaling

Using Spot Instances with mixed instances policies in Auto Scaling groups can achieve up to 90% cost savings compared to On-Demand pricing. Specify multiple instance types and use the capacity-optimized allocation strategy to distribute Spot interruption risk. A configuration that secures minimum capacity with On-Demand and covers excess demand with Spot provides an excellent balance of stability and cost. Setting up a warm pool keeps pre-initialized instances in a pool, reducing startup time during scale-out. Use CloudWatch custom metrics (queue depth, active connections) in scaling policies to achieve more precise scaling that does not rely solely on CPU utilization.

Summary

Auto Scaling builds demand-driven infrastructure using three types of scaling policies: target tracking, step, and predictive. Predictive scaling proactively handles sudden traffic increases, while mixed instances policies leverage Spot Instances for cost optimization. Warm pools reduce startup time, achieving both high availability and cost efficiency.

Why Auto Scaling Scales Out Fast but Scales In Cautiously - The Design Intent Behind Asymmetric Decision LogicThis article explains why EC2 Auto Scaling executes scale-out immediately while applying a cooldown period for scale-in, the flapping prevention mechanism, and the internal logic of target tracking scaling.AWS Fault Domain Design - How the Three-Layer Structure of AZs, Regions, and Partitions Protects AvailabilityLearn why AWS infrastructure is designed with three layers of fault domains - AZs (fault isolation), Regions (geographic separation), and Partitions (political separation) - and how far failures propagate at each layer, with real-world examples.Distributed Systems Principles Learned from AWS Outages - How Past Major Incidents Reshaped ArchitectureUsing AWS's published incident reports as case studies - including the S3 outage (2017), Kinesis outage (2020), and the unique nature of us-east-1 - this article explains design principles such as Shuffle Sharding, Static Stability, and Cell-based Architecture.Why AWS Builds Regions Where It Does - The Hidden Criteria Behind Data Center Site SelectionWe explain the criteria AWS considers when deciding region locations, including power supply, geopolitical risk, data sovereignty legislation, network connectivity, and natural disaster risk, with concrete examples from specific regions.Why AWS Availability Zone IDs Differ Per Account - The Design Intent Behind AZ MappingExplains how us-east-1a maps to different physical AZs per account, why AZ IDs (use1-az1) were introduced, the design intent of even capacity distribution, and considerations for cross-account AZ specification.Batch Computing Infrastructure - Large-Scale Parallel Processing with AWS BatchLearn how to build large-scale batch processing with AWS Batch. Covers job queue design, auto-scaling compute environments, cost optimization with Spot Instances, and building batch infrastructure ideal for scientific computing and large-scale data processing.Automating Batch Computing with AWS Batch - Designing Job Queues and Compute EnvironmentsLearn about job scheduling with AWS Batch, choosing between Fargate and EC2 compute environments, and leveraging Spot Instances for cost optimization.Large-Scale Batch Processing with AWS Batch - Job Queue Design and Cost OptimizationLearn how to design job queue priorities, choose between Fargate and EC2 compute environments, and build complex computational pipelines using array jobs and job dependencies.

Overview of Auto Scaling

Designing Scaling Policies

Predictive Scaling and Scheduled Scaling

Cost Optimization with Auto Scaling

Summary

Related Services

Related Articles

More on This Topic

Similar Articles and Services