Amazon CloudWatch Internet Monitor - Instantly Detect ISP Outages and Visualize User Impact

CloudWatch Internet Monitor continuously monitors the availability and performance of end users accessing your application over the internet, broken down by ISP, city, and ASN. Leveraging AWS global network observability data, it provides an integrated workflow from detecting performance degradation to supporting DNS routing failover decisions.

About 8 min readLast updated: 2026-04-22

Why End-User Perspective Monitoring Became Necessary

Traditional CloudWatch metrics have focused on monitoring AWS infrastructure-side indicators such as EC2 CPU utilization and ALB latency. However, no matter how well your application is running, if there are problems on the internet path between users and the AWS Region, end users will experience latency and timeouts. Network issues that occur outside AWS's control - ISP outages, submarine cable damage, BGP routing anomalies in specific regions - are far from rare. In fact, major ISP outages occur on the order of dozens per year, sometimes affecting millions of users. CloudWatch Internet Monitor was designed to fill this blind spot. It leverages network observability data collected from AWS's globally deployed CloudFront and Route 53 infrastructure to visualize the internet quality that end users actually experience. The core idea behind this service is a shift in perspective: capturing problems from the point closest to the user that server-side metrics alone cannot reveal.

How It Leverages AWS Global Network Observability Data

The defining feature of Internet Monitor is that it requires no proprietary probes or agents on the user side. AWS continuously collects internet path performance data worldwide through CloudFront's 600+ edge locations and Route 53's resolver network. Internet Monitor cross-references this vast observability data with the traffic patterns of AWS resources you specify (CloudFront distributions, VPCs, WorkSpaces directories, etc.) to estimate round-trip time (RTT) and availability fluctuations for access from specific ISPs and cities. For example, if RTT from NTT Docomo connections in Tokyo spikes from the usual 15 ms to 120 ms, Internet Monitor detects the anomaly within 5 minutes and raises a health event notification. This detection speed dramatically reduces the time to recognize an outage compared to the traditional approach of waiting for user reports. The fact that monitoring begins in near real-time simply by registering the resources to monitor - agentless and with minimal operational overhead - is a key advantage.

Health Event Detection Logic and Threshold Design

Internet Monitor calculates availability and performance scores on a 0-100 scale for monitored traffic and generates a health event when these scores fall below configured thresholds. The default thresholds are 95% for availability and 95% for performance, but they can be customized to match your application's SLA. Crucially, Internet Monitor calculates scores not only at the global level but for each city-ISP (ASN) combination. Even if overall availability is 99%, a localized drop to 80% for a specific ISP in Osaka will be captured as a health event. Health events also include an estimate of the affected traffic volume, enabling you to distinguish between a minor issue affecting 0.5% of all users and a major outage affecting 30%, and prioritize your response accordingly. Integration with EventBridge allows you to route health events to Lambda or SNS and build automated response workflows. Since overly strict thresholds increase noise while overly lenient ones delay detection, a practical approach is to observe the baseline for 1-2 weeks during initial operation before making adjustments.

Visualization That Supports DNS Routing Failover Decisions

The traffic insights provided by Internet Monitor directly inform DNS routing failover decisions in multi-Region architectures. For example, if you run an active-active configuration across the Tokyo and Osaka Regions and performance degrades for a specific ISP on the path to Tokyo, you need data to decide whether to redirect affected users' traffic to Osaka. The Internet Monitor console visualizes the affected city-ISP combinations, estimated traffic volume, and RTT increase on a map. Furthermore, by combining it with Route 53 health checks, you can build a configuration that automatically triggers failover based on Internet Monitor health events. Azure Front Door offers a similar global traffic monitoring capability, but Internet Monitor's deep integration with CloudFront and Route 53 - completing the workflow entirely within the AWS ecosystem - is a design advantage. Even for manual failover, the ability to confirm the scope and severity of degradation on the Internet Monitor dashboard before making a decision helps prevent unnecessary failovers caused by overreaction.

Cost Structure and Monitored Resource Design

Internet Monitor pricing is usage-based, charged according to the traffic volume processed by monitored resources. There is no fixed monthly fee per monitor; charges are based on the proportion of traffic passing through monitored CloudFront distributions or VPCs that Internet Monitor analyzes. You can set an upper limit on monitored traffic per monitor, supporting up to 500,000 city-networks (city and ASN combinations). To control costs, it is more effective to create monitors only for business-critical applications rather than consolidating all resources into a single monitor. For example, manage an internal admin console and a customer-facing production service with separate monitors, and configure strict thresholds and alert integrations only for the production service monitor. Each monitor supports up to 50 registered resources, and for VPCs, you register one per Region. When you add a CloudFront distribution as a monitored resource, the geographic distribution and performance of all traffic passing through that distribution are automatically analyzed, providing broad visibility with no additional configuration.

Operational Design Tips and Differentiating from Other Monitoring Services

To operate Internet Monitor effectively, it is important to clearly define its role relative to other CloudWatch monitoring capabilities. Synthetics actively monitors specific endpoint responses by running probes on a schedule, while RUM measures real user experience through JavaScript embedded in the browser. Internet Monitor differs from both by passively monitoring the health of the entire internet path using observability data from AWS network infrastructure. Combining all three achieves three-layer monitoring across the infrastructure layer, application layer, and user experience layer. In practice, a common workflow is to route health events via EventBridge to Slack or PagerDuty, where an on-call engineer reviews the impact scope before deciding whether failover is needed. Health event history can be stored in CloudWatch Logs and analyzed in monthly reviews to identify ISP-specific outage trends and improve multi-Region design. Related books (Amazon) are also a helpful reference.

How AWS Keeps Time Internally - Amazon Time Sync Service and Leap Second Smearing DesignLearn how Amazon Time Sync Service works, how GPS and atomic clocks provide high-precision time sources, the design decision to absorb leap seconds through smearing, and why time synchronization matters in distributed systems.Centralizing SaaS Audit Logs with AWS AppFabric - OCSF Standardization and Security Lake IntegrationLearn how AppFabric collects audit logs from SaaS applications, standardizes them to OCSF format, and builds analysis pipelines.Implementing Feature Flags with AWS AppConfig - Safe Configuration Deployment and RollbackRoll out configuration changes independently from code deployments using Linear and Exponential strategies. Ensure safety with automatic rollback triggered by CloudWatch alarms.Architecture Review - Systematically Evaluate Workloads with the AWS Well-Architected ToolLearn about architecture reviews using the AWS Well-Architected Tool. Covers evaluation based on the six pillars, improvement planning, and custom lens usage.Audit Log Design and Operations - Complete API Activity Recording with CloudTrailLearn how to design audit logs using AWS CloudTrail, including recording API activity, long-term storage in S3, and compliance automation through integration with AWS Config.Lessons from AWS Incident Reports (COE) - How Past Major Outages Shaped Design PrinciplesAnalyze the root causes of past major incidents including the S3 outage, us-east-1 DNS failure, and Kinesis outage from AWS's published Correction of Errors (COE) and incident reports, and explain how they changed AWS's design principles.Tag Design Determines Operations - Trivia and Practical Naming Conventions for AWS Resource Tagging StrategyWe explain why AWS resource tags are not just labels but the foundation for cost allocation, access control, and automation, covering tag key naming conventions, how to use the 50-tag limit, and governance through tag policies.Why AWS Service Quotas Exist - Multi-Tenant Design That Protects Shared InfrastructureExplain how AWS service quotas (formerly service limits) are not mere restrictions but a design to protect other customers in a multi-tenant environment, covering the noisy neighbor problem, soft vs hard limits, and what happens behind quota increase requests.

Why End-User Perspective Monitoring Became Necessary

How It Leverages AWS Global Network Observability Data

Health Event Detection Logic and Threshold Design

Visualization That Supports DNS Routing Failover Decisions

Cost Structure and Monitored Resource Design

Operational Design Tips and Differentiating from Other Monitoring Services

Related Services

Related Articles

More on This Topic

Similar Articles and Services