How DynamoDB Guarantees Single-Digit Millisecond Latency - Partition Splitting and Request Router Internals
Explore how DynamoDB maintains single-digit millisecond latency regardless of scale, through its three-layer architecture of partition splitting algorithms, request routers, and storage nodes.
What the Single-Digit Millisecond Promise Means
DynamoDB claims to deliver "single-digit millisecond latency for reads and writes regardless of table size or request volume." This promise holds whether the table is 1KB or 100TB, and whether it handles 10 requests per second or 10 million. In relational databases, as table size grows, index depth increases and disk I/O rises, degrading latency. DynamoDB avoids this problem because its data storage and access paths are fundamentally different. DynamoDB splits data into small units called partitions and places each partition on independent storage nodes. Since a request always targets a single partition, the overall table size has no impact on individual request latency. This is the core of "consistent latency regardless of scale."
Partition Splitting - Automatic Horizontal Data Distribution
DynamoDB's partition splitting is fully automated and requires no user intervention. Each partition can store up to 10GB of data and handle up to 3,000 RCU (Read Capacity Units) or 1,000 WCU (Write Capacity Units) of throughput. When a table's data volume or throughput reaches these limits, DynamoDB automatically splits the partition. Data is distributed across partitions based on the hash value of the partition key. The critical factor here is partition key design. If all requests concentrate on the same partition key, that partition becomes a hot spot, hitting the throughput limit and causing throttling. Adaptive capacity, introduced in 2019, automatically redistributes throughput to hot partitions, but the fundamental solution is to design partition keys that distribute access evenly. For example, using a date as the partition key concentrates access on "today's" partition. The standard approach is the "write sharding" pattern, which appends a random suffix to the date to distribute the load.
Request Router - Instantly Identifying the Right Partition
The DynamoDB request router receives client requests, identifies the target partition from the partition key's hash value, and forwards the request to the storage node holding that partition. AWS published the internal structure of the request router in its 2022 paper "Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service." The request router caches a partition map (which partition key ranges are on which storage nodes) in memory, and routing decisions complete in microseconds. When partition splits or moves occur, the partition map is updated asynchronously. If a stale map is temporarily referenced, the storage node returns a redirect response, and the request router updates the map and retries. This mechanism ensures request processing is never interrupted during partition relocation. Multiple request router instances operate in parallel, eliminating any single point of failure.
Storage Nodes and Replication - The Three-Way Write Mechanism
Each partition's data is replicated across three storage nodes. A write request is first received by the leader node, which replicates it to two follower nodes. Once two of the three nodes (leader + 1 follower) have completed the write, a success response is returned to the client. This is the "2 of 3 quorum write." There are two read modes. Eventually Consistent Read reads from any one of the three nodes. It may not reflect the latest write, but it offers the lowest latency. Strongly Consistent Read reads from the leader node, always returning the latest data, but since load concentrates on the leader node, throughput is halved compared to eventually consistent reads. A 2023 DynamoDB paper revealed that storage nodes use a B-tree-based storage engine. The choice of B-tree over LSM-tree (Log-Structured Merge-tree) is a design decision that prioritizes predictable read latency.
Global Tables and DAX - Going Below Single-Digit Milliseconds
DynamoDB's single-digit millisecond latency applies within a single region. For globally distributed applications, there is a need to read and write data from the region closest to the user. Global Tables automatically creates table replicas in multiple regions, providing multi-master replication that allows reads and writes from any region. A write to the Tokyo region propagates to replicas in Virginia and Frankfurt within seconds. Consistency between replicas is eventual, but reads and writes within the same region are processed at the same single-digit millisecond latency as standard DynamoDB. For even lower latency, use DAX (DynamoDB Accelerator). DAX is an in-memory cache placed in front of DynamoDB that reduces read latency to microseconds (hundreds of microseconds). Since DAX is API-compatible with DynamoDB, the only application code change needed is swapping the endpoint. However, because DAX is a cache, reads immediately after writes may return stale data. Design with cache TTL settings and post-write read patterns in mind. For a deeper understanding of NoSQL database design patterns, specialized books (Amazon) are a helpful reference.