Why CloudWatch Has 1-Minute and 5-Minute Metrics - The Trade-off Between Monitoring Granularity and Cost
This article explains the technical and economic reasons behind the split between CloudWatch basic monitoring (5 minutes) and detailed monitoring (1 minute), the step-down aggregation of metric retention periods, and the high-resolution mode for custom metrics.
Why 5-Minute Intervals Are the Default
EC2 basic monitoring collects metrics at 5-minute intervals. Why 5 minutes instead of 1 minute? There are two reasons. First, data volume and storage cost. AWS collects metrics from millions of EC2 instances. Switching to 1-minute intervals would generate 5 times the data points compared to 5-minute intervals, increasing storage and processing costs fivefold. To offer basic monitoring for free, 5-minute intervals are the economically rational threshold. Second, 5-minute intervals are sufficient for most workloads. For tracking trends in CPU utilization and network traffic, 5-minute data is adequate. Auto Scaling decisions also work properly with 5-minute metrics. However, for detecting spike loads or monitoring latency-sensitive workloads, 5-minute intervals are too coarse. A 30-second CPU spike gets buried in a 5-minute average and goes undetected. In such cases, you need to enable detailed monitoring (1-minute intervals).
Metric Retention Periods and Step-Down Aggregation
CloudWatch metric data is retained indefinitely, but resolution decreases progressively over time. Data points at 1-second intervals are retained for 3 hours. Data points at 1-minute intervals are retained for 15 days. Data points at 5-minute intervals are retained for 63 days. Data points at 1-hour intervals are retained for 455 days (approximately 15 months). This step-down aggregation is designed to balance storage efficiency with long-term trend analysis. For the most recent 3 hours, you can investigate issues with second-level detail, and for the past 15 months, you can perform capacity planning with hourly data. Without knowing this aggregation rule, you might be puzzled by the phenomenon where "yesterday's data is available at 1-minute intervals, but last month's data is only available at 5-minute intervals." If you need to retain high-resolution data over a long period, you can export data to S3 using CloudWatch Metrics Streams and analyze it with Athena.
High-Resolution Mode for Custom Metrics
CloudWatch custom metrics default to 1-minute intervals, but using high-resolution mode, you can send data points at 1-second intervals. Simply set the StorageResolution parameter to 1 in the PutMetricData API. High-resolution metrics are powerful for use cases that demand real-time responsiveness. For example, monitoring API response times at 1-second intervals allows you to immediately detect latency spikes lasting just a few seconds. At 1-minute intervals, spikes get smoothed out in the 60-second average and become invisible. However, high-resolution metrics require cost consideration. Custom metric pricing is $0.30 per metric per month (for the first 10,000 metrics), with no price difference based on resolution. However, the increased number of PutMetricData API calls raises API charges ($0.01 per 1,000 requests). Sending metrics at 1-second intervals for one month generates approximately 2.6 million requests (about $26/month). At 1-minute intervals, it would be approximately 43,000 requests (about $0.43/month). Given this 60x cost difference, apply high-resolution only to metrics that truly need it.
Alarm Evaluation Periods and M of N Configuration
CloudWatch alarms send notifications when a metric exceeds a threshold, but there are subtle specifications in the evaluation logic worth knowing. The alarm evaluation period (Period) is the time window over which metric data points are aggregated. Setting the evaluation period to 5 minutes means the 5-minute average (or maximum, minimum, sum) is compared against the threshold. The "M of N" setting (Datapoints to Alarm) triggers the alarm when the threshold is exceeded in M or more of the last N evaluation periods. For example, setting "3 of 5" means the alarm enters the ALARM state when the threshold is exceeded in 3 or more of the last 5 evaluation periods. This setting helps suppress false positives caused by temporary spikes. An often-overlooked aspect is the behavior when metric data points are missing. When an EC2 instance stops, CPU metrics are no longer sent, resulting in missing data points. By default, missing data points are treated as "missing" and do not affect alarm state transitions. The TreatMissingData parameter lets you change this behavior to treat missing data as "breaching" (threshold exceeded) or "notBreaching" (within threshold).
Patterns Where CloudWatch Costs Become Unexpectedly High
The most commonly overlooked CloudWatch cost is the GetMetricData API call charges. Every time you open a CloudWatch dashboard, GetMetricData API calls are made for every metric displayed. A dashboard with 10 widgets showing 5 metrics each, left open with auto-refresh (1-minute intervals) for 8 hours, generates approximately 24,000 requests per day. At $0.01 per 1,000 metric requests for GetMetricData, this single dashboard costs about $7/month. With 10 dashboards, that becomes $70/month. Another high-cost pattern is CloudWatch Logs data ingestion. When Lambda functions or ECS tasks output large volumes of logs, ingestion charges ($0.50/GB) can surge. Leaving DEBUG-level logging enabled in a production environment can result in hundreds of dollars in monthly log charges. Countermeasures include setting appropriate log levels, shortening log retention periods (the default is indefinite), and using CloudWatch Logs subscription filters to export only necessary logs to S3. For a systematic approach to monitoring design and cost optimization, specialized books on Amazon are a helpful reference.