AWS Auto Scaling のアイコン

AWS Auto Scaling Essential2009年〜

A service that automatically scales EC2, ECS, DynamoDB, and other resources based on demand

What It Does

AWS Auto Scaling automatically adjusts the number of resources - such as EC2 instances, ECS tasks, DynamoDB tables, and Aurora replicas - based on application demand. You set scaling policies based on metrics like CPU utilization or request count, and it adds resources when load is high and removes them when load is low. This lets you maintain performance while optimizing costs.

Use Cases

Used for handling traffic spikes during e-commerce sales events, varying server counts between business hours and nighttime for cost optimization, and automatically adjusting resources to maintain consistent response times for web applications. With predictive scaling, you can even forecast demand from historical traffic patterns and scale out in advance.

Everyday Analogy

Think of it like checkout lanes at a supermarket. During quiet hours, only 2 registers (servers) are open. When it gets busy, more registers automatically open up to 5. When traffic dies down, it goes back to 2. The store manager (Auto Scaling) constantly monitors the checkout lines (load) and automatically adjusts the number of open registers.

What Is Auto Scaling?

AWS Auto Scaling is a service that automatically increases or decreases the number of resources based on your application's load. One of the biggest benefits of the cloud is the ability to use only the resources you need, when you need them. Auto Scaling automates this benefit, eliminating the need to manually add or remove servers. It prevents both wasted costs from over-provisioning and performance degradation from under-provisioning.

Scaling Policies

Auto Scaling offers three main scaling approaches. Target tracking scaling is the simplest - you just set a target value, such as maintaining CPU utilization at 70%. Step scaling lets you configure different scaling amounts for different load levels. Scheduled scaling allows time-based settings, such as increasing instances every morning at 9 AM.

Predictive Scaling

Predictive scaling uses machine learning to analyze historical traffic patterns and forecast future demand, scaling out proactively. For example, if traffic consistently spikes at 10 AM every day, it increases instances just before 10 AM. This handles sudden load increases that reactive scaling cannot keep up with. For case studies and best practices on predictive scaling, technical books on Amazon are a helpful reference.

Getting Started

For EC2, start by defining your instance configuration in a launch template, then create an Auto Scaling group. Set the minimum, maximum, and desired instance counts, and add a scaling policy. The easiest way to begin is with a target tracking policy targeting 70% CPU utilization. You can monitor scaling activity through CloudWatch metrics.

Things to Watch Out For

  • Auto Scaling itself is free, but you pay standard rates for resources added through scale-out (such as EC2 instances)
  • Since instances are terminated during scale-in, your application must be designed to be stateless. Externalize session data to ElastiCache or DynamoDB
  • If cooldown periods are not configured properly, frequent scale-out and scale-in cycles (known as flapping) may occur
共有するXB!