Amazon EC2
A cloud-based virtual server service from AWS that lets you choose from hundreds of instance types and launch servers in minutes
Overview
Amazon Elastic Compute Cloud (EC2) is the core compute service of AWS cloud computing. Without purchasing or installing physical servers, you can instantly launch virtual machines (instances) from a browser or CLI. Hundreds of instance types are available across categories including general purpose, compute optimized, memory optimized, storage optimized, and accelerated computing, covering workloads from web application hosting to large-scale data processing and machine learning model training. Combining three pricing models - On-Demand, Reserved Instances, and Spot Instances - allows you to optimize costs.
The Nitro System and How to Choose an Instance Family
EC2 runs on the Nitro System, AWS's proprietary next-generation virtualization platform. The Nitro System offloads networking, storage, and security processing to dedicated hardware, minimizing host OS overhead and delivering near-bare-metal performance. Each instance is placed within a VPC and governed by security groups and network ACLs for access control. Attaching EBS volumes provides persistent block storage, while instance store offers high-speed ephemeral storage. Instance families are organized into categories - general purpose (M/T), compute optimized (C), memory optimized (R/X), storage optimized (I/D), and accelerated computing (P/G) - and selecting the right family based on workload characteristics is the first step in cost-efficient design. Graviton processor-based instances deliver up to 40% better cost-performance compared to x86, an advantage that Azure's limited ARM-based offerings have yet to match.
Pricing Design - On-Demand, Reserved, and Spot
EC2 offers three main pricing models: On-Demand (per-second billing with no commitment), Reserved Instances (up to 72% discount with 1- or 3-year commitments), and Spot Instances (up to 90% discount using surplus capacity). A common production strategy is to cover baseline capacity with Reserved Instances or Savings Plans, handle traffic spikes with On-Demand instances via Auto Scaling, and offload fault-tolerant batch processing to Spot Instances. EC2 Spot provides a 2-minute interruption warning, giving applications enough time for graceful shutdown - notably longer than Azure Spot Virtual Machines' 30-second notice. For a systematic study of Amazon EC2 from basics to advanced topics, books (Amazon) are a great resource.
Auto Scaling and Placement Strategies
Auto Scaling groups automatically adjust instance counts based on metrics such as CPU utilization, request volume, or custom CloudWatch metrics. Target tracking policies are the simplest to configure - for example, maintaining average CPU utilization at 60% - while step scaling policies offer finer control over scaling increments. Predictive scaling uses machine learning to forecast traffic patterns and pre-provision capacity before demand spikes arrive. Placement groups control how instances are physically distributed: cluster placement groups pack instances close together for low-latency inter-node communication (HPC, distributed databases), spread placement groups isolate instances across distinct hardware to maximize fault tolerance, and partition placement groups divide instances into logical partitions for large distributed workloads like HDFS and Cassandra. GPU instances (P5, G5, etc.) handle machine learning training and inference, while HPC-optimized Hpc6a instances tackle high-performance computing workloads.