Why CloudWatch Has 1-Minute and 5-Minute Metrics - The Trade-off Between Monitoring Granularity and Cost

This article explains the technical and economic reasons behind the split between CloudWatch basic monitoring (5 minutes) and detailed monitoring (1 minute), the step-down aggregation of metric retention periods, and the high-resolution mode for custom metrics.

約 6 分で読めます最終更新: 2025-09-29

Why 5-Minute Intervals Are the Default

EC2 basic monitoring collects metrics at 5-minute intervals. Why 5 minutes instead of 1 minute? There are two reasons. First, data volume and storage cost. AWS collects metrics from millions of EC2 instances. Switching to 1-minute intervals would generate 5 times the data points compared to 5-minute intervals, increasing storage and processing costs fivefold. To offer basic monitoring for free, 5-minute intervals are the economically rational threshold. Second, 5-minute intervals are sufficient for most workloads. For tracking trends in CPU utilization and network traffic, 5-minute data is adequate. Auto Scaling decisions also work properly with 5-minute metrics. However, for detecting spike loads or monitoring latency-sensitive workloads, 5-minute intervals are too coarse. A 30-second CPU spike gets buried in a 5-minute average and goes undetected. In such cases, you need to enable detailed monitoring (1-minute intervals).

Metric Retention Periods and Step-Down Aggregation

CloudWatch metric data is retained indefinitely, but resolution decreases progressively over time. Data points at 1-second intervals are retained for 3 hours. Data points at 1-minute intervals are retained for 15 days. Data points at 5-minute intervals are retained for 63 days. Data points at 1-hour intervals are retained for 455 days (approximately 15 months). This step-down aggregation is designed to balance storage efficiency with long-term trend analysis. For the most recent 3 hours, you can investigate issues with second-level detail, and for the past 15 months, you can perform capacity planning with hourly data. Without knowing this aggregation rule, you might be puzzled by the phenomenon where "yesterday's data is available at 1-minute intervals, but last month's data is only available at 5-minute intervals." If you need to retain high-resolution data over a long period, you can export data to S3 using CloudWatch Metrics Streams and analyze it with Athena.

High-Resolution Mode for Custom Metrics

CloudWatch custom metrics default to 1-minute intervals, but using high-resolution mode, you can send data points at 1-second intervals. Simply set the StorageResolution parameter to 1 in the PutMetricData API. High-resolution metrics are powerful for use cases that demand real-time responsiveness. For example, monitoring API response times at 1-second intervals allows you to immediately detect latency spikes lasting just a few seconds. At 1-minute intervals, spikes get smoothed out in the 60-second average and become invisible. However, high-resolution metrics require cost consideration. Custom metric pricing is $0.30 per metric per month (for the first 10,000 metrics), with no price difference based on resolution. However, the increased number of PutMetricData API calls raises API charges ($0.01 per 1,000 requests). Sending metrics at 1-second intervals for one month generates approximately 2.6 million requests (about $26/month). At 1-minute intervals, it would be approximately 43,000 requests (about $0.43/month). Given this 60x cost difference, apply high-resolution only to metrics that truly need it.

Alarm Evaluation Periods and M of N Configuration

CloudWatch alarms send notifications when a metric exceeds a threshold, but there are subtle specifications in the evaluation logic worth knowing. The alarm evaluation period (Period) is the time window over which metric data points are aggregated. Setting the evaluation period to 5 minutes means the 5-minute average (or maximum, minimum, sum) is compared against the threshold. The "M of N" setting (Datapoints to Alarm) triggers the alarm when the threshold is exceeded in M or more of the last N evaluation periods. For example, setting "3 of 5" means the alarm enters the ALARM state when the threshold is exceeded in 3 or more of the last 5 evaluation periods. This setting helps suppress false positives caused by temporary spikes. An often-overlooked aspect is the behavior when metric data points are missing. When an EC2 instance stops, CPU metrics are no longer sent, resulting in missing data points. By default, missing data points are treated as "missing" and do not affect alarm state transitions. The TreatMissingData parameter lets you change this behavior to treat missing data as "breaching" (threshold exceeded) or "notBreaching" (within threshold).

Patterns Where CloudWatch Costs Become Unexpectedly High

The most commonly overlooked CloudWatch cost is the GetMetricData API call charges. Every time you open a CloudWatch dashboard, GetMetricData API calls are made for every metric displayed. A dashboard with 10 widgets showing 5 metrics each, left open with auto-refresh (1-minute intervals) for 8 hours, generates approximately 24,000 requests per day. At $0.01 per 1,000 metric requests for GetMetricData, this single dashboard costs about $7/month. With 10 dashboards, that becomes $70/month. Another high-cost pattern is CloudWatch Logs data ingestion. When Lambda functions or ECS tasks output large volumes of logs, ingestion charges ($0.50/GB) can surge. Leaving DEBUG-level logging enabled in a production environment can result in hundreds of dollars in monthly log charges. Countermeasures include setting appropriate log levels, shortening log retention periods (the default is indefinite), and using CloudWatch Logs subscription filters to export only necessary logs to S3. For a systematic approach to monitoring design and cost optimization, specialized books on Amazon are a helpful reference.

How AWS Keeps Time Internally - Amazon Time Sync Service and Leap Second Smearing DesignLearn how Amazon Time Sync Service works, how GPS and atomic clocks provide high-precision time sources, the design decision to absorb leap seconds through smearing, and why time synchronization matters in distributed systems.Centralizing SaaS Audit Logs with AWS AppFabric - OCSF Standardization and Security Lake IntegrationLearn how AppFabric collects audit logs from SaaS applications, standardizes them to OCSF format, and builds analysis pipelines.Implementing Feature Flags with AWS AppConfig - Safe Configuration Deployment and RollbackRoll out configuration changes independently from code deployments using Linear and Exponential strategies. Ensure safety with automatic rollback triggered by CloudWatch alarms.Architecture Review - Systematically Evaluate Workloads with the AWS Well-Architected ToolLearn about architecture reviews using the AWS Well-Architected Tool. Covers evaluation based on the six pillars, improvement planning, and custom lens usage.Audit Log Design and Operations - Complete API Activity Recording with CloudTrailLearn how to design audit logs using AWS CloudTrail, including recording API activity, long-term storage in S3, and compliance automation through integration with AWS Config.Lessons from AWS Incident Reports (COE) - How Past Major Outages Shaped Design PrinciplesAnalyze the root causes of past major incidents including the S3 outage, us-east-1 DNS failure, and Kinesis outage from AWS's published Correction of Errors (COE) and incident reports, and explain how they changed AWS's design principles.Tag Design Determines Operations - Trivia and Practical Naming Conventions for AWS Resource Tagging StrategyWe explain why AWS resource tags are not just labels but the foundation for cost allocation, access control, and automation, covering tag key naming conventions, how to use the 50-tag limit, and governance through tag policies.Why AWS Service Quotas Exist - Multi-Tenant Design That Protects Shared InfrastructureExplain how AWS service quotas (formerly service limits) are not mere restrictions but a design to protect other customers in a multi-tenant environment, covering the noisy neighbor problem, soft vs hard limits, and what happens behind quota increase requests.

Why 5-Minute Intervals Are the Default

Metric Retention Periods and Step-Down Aggregation

High-Resolution Mode for Custom Metrics

Alarm Evaluation Periods and M of N Configuration

Patterns Where CloudWatch Costs Become Unexpectedly High

Related Services

Related Articles

More on This Topic

Similar Articles and Services