How CloudFront's 600+ PoPs Work - Anycast Routing and the Cache Hierarchy

Learn how CloudFront routes user requests to the nearest PoP using Anycast, the two-tier structure of edge locations and regional edge caches, and the design factors that determine cache hit rates.

What Is a PoP (Point of Presence)

CloudFront PoPs are cache server clusters distributed around the world. As of 2024, over 600 PoPs are deployed across more than 90 cities. The role of a PoP is to cache content from origin servers (such as S3 buckets or EC2 instances) and serve it from a location physically close to the user, reducing latency. If a user in Tokyo directly accesses an S3 bucket in US Virginia, the round trip across the Pacific incurs 150-200ms of latency. If the content is cached at a Tokyo PoP, latency drops to 5-10ms. PoP sizes are not uniform. Large PoPs in cities like Tokyo, London, and Virginia consist of thousands of servers and can cache large volumes of content. Smaller city PoPs have tens to hundreds of servers with limited cache capacity. CloudFront dynamically adjusts PoP capacity based on traffic volume, temporarily increasing capacity during major events (sports broadcasts, product launches, etc.).

Anycast Routing - How Users Are Directed to the Nearest PoP

CloudFront routes user requests to the nearest PoP through a combination of DNS-based routing and Anycast. When a user accesses a CloudFront distribution URL (e.g., d1234.cloudfront.net), DNS resolution occurs first. CloudFront's DNS servers estimate the user's geographic location from the DNS resolver's IP address and return the IP address of the nearest PoP. This is where Anycast comes in. Anycast is a technology where multiple PoPs simultaneously announce the same IP address. Through BGP (Border Gateway Protocol) routing, packets automatically reach the network-closest PoP. DNS-based routing alone may not select the optimal PoP when the DNS resolver's location differs from the user's actual location (e.g., when using a centralized corporate DNS server). Anycast selects the shortest network path, complementing this limitation. Since 2022, CloudFront also supports EDNS Client Subnet (ECS), where the DNS resolver passes the user's actual subnet information to CloudFront's DNS, enabling more accurate PoP selection.

Two-Tier Cache - Edge Locations and Regional Edge Caches

CloudFront's cache has a two-tier structure. The first tier consists of edge locations (PoPs), located closest to users. The second tier consists of regional edge caches (RECs), positioned between edge locations and origin servers. There are 13 RECs worldwide, with larger cache capacity than edge locations. The request flow works as follows. A user's request arrives at an edge location; if the content is in cache, it is returned immediately (cache hit). If the edge location does not have the content cached, it queries the REC. If the REC has it cached, it returns from there; otherwise, it fetches from the origin server. The advantage of this two-tier structure is a significant reduction in requests to the origin server. Unpopular content (long tail) is easily evicted from individual edge location caches, but the REC's larger cache capacity increases the probability of retention. According to AWS published data, the introduction of RECs has reduced origin requests by up to 60% in some cases.

Design Factors That Determine Cache Hit Rates

CloudFront cache hit rates can range from 50% to 99% depending on design. The single biggest factor determining cache hit rates is cache key design. By default, CloudFront uses the full URL path as the cache key, but if you configure query strings, headers, or cookies to be included in the cache key, the same content gets different cache keys, causing cache fragmentation and lower hit rates. For example, if query strings include tracking parameters (utm_source, utm_medium, etc.), each parameter combination creates a separate cache entry for the same page. By using CloudFront's cache policy to whitelist which query strings are included in the cache key and excluding tracking parameters, hit rates improve dramatically. TTL (Time to Live) settings are also important. If TTL is too short, the cache is invalidated frequently, increasing requests to the origin. The most effective strategy for static assets (images, CSS, JS) is to set TTL to 1 year and include a hash in the filename (app.a1b2c3.js), so that content updates automatically switch to a new URL and cache entry.

Origin Shield - A Third Cache Layer to Protect the Origin

Origin Shield, introduced in 2020, adds an additional cache layer between RECs and the origin. When Origin Shield is enabled, all origin fetches from RECs are consolidated through a single Origin Shield location. If Origin Shield has the content cached, no request reaches the origin. The greatest benefit of Origin Shield is mitigating the "thundering herd" problem. This is the phenomenon where requests from PoPs worldwide simultaneously flood the origin the moment a popular content's TTL expires. Without Origin Shield, origin fetches occur simultaneously from all 13 RECs, but with Origin Shield, they are consolidated into a single request. Origin Shield pricing is request-based ($0.0090 per 10,000 requests), and considering the reduction in origin server load and scaling costs, it is a cost-effective investment in most cases. This is especially true when the origin is a pay-per-use service like Lambda Function URL or API Gateway, where request reduction through Origin Shield directly translates to cost savings. For a systematic study of CDN design and optimization, specialized books on Amazon are a helpful reference.