Building Disaster Recovery with AWS Elastic Disaster Recovery - Continuous Replication and Recovery Testing

Continuously replicate on-premises servers to AWS and validate recovery procedures with recovery drills. Learn the end-to-end workflow through failback.

About 7 min readLast updated: 2026-05-27

Overview of Elastic Disaster Recovery

Elastic Disaster Recovery (DRS) is a service that continuously replicates on-premises or other cloud servers to AWS for rapid recovery during disasters. After installing the AWS Replication Agent on the source server, block-level changes are continuously replicated to a staging area in AWS via TCP port 1500. The initial sync performs a full disk transfer, after which only changed blocks are transmitted incrementally, maintaining an RPO of seconds while minimizing network bandwidth consumption. The agent supports Windows Server 2012 R2 and later, as well as major Linux distributions (Amazon Linux, RHEL, CentOS, Ubuntu, SUSE, Debian). The staging area uses low-cost EBS volumes (gp3 or st1) to store the source disk data without compression.

Recovery Drills and Failover

A recovery drill is a test that launches EC2 instances from replicated data and verifies application functionality. It can be executed without affecting production replication, and provides actual RTO measurements. Drills can restore a specific point in time from point-in-time snapshots, allowing you to specify a pre-corruption timestamp in case of data corruption. DRS retains snapshots for several days, making it suitable for ransomware recovery scenarios. Failover follows the same procedure as a drill, launching EC2 instances and switching DNS to direct production traffic to AWS. The Recovery Plan feature allows batch failover of multiple servers from the DRS console, defining launch order and wait times to recover interdependent server groups in the correct sequence. Failback is the operation of returning data from AWS to the original on-premises environment, using a dedicated Failback Client launched at the source site while DRS manages the reverse replication.

Network Design and Recovery Automation

DRS replication servers are placed in a staging subnet to receive data from source servers. Instances launched during recovery are placed in a separate subnet (recovery subnet), with production network settings defined in advance. Launch templates configure instance type, security groups, subnets, and IAM roles to minimize manual work during recovery. Post-launch actions automatically execute scripts after instance startup, automating DNS switching and application configuration changes. The staging subnet does not require outbound internet access, allowing replication to complete entirely over private connections via VPN or Direct Connect. It is also possible to maintain the source server's private IP on the recovery instance, preserving IP-dependent application configurations. For understanding DRP design best practices, related books (Amazon) are a helpful reference.

DRS Pricing and Limit Considerations

DRS pricing consists of EC2 instances and EBS volumes for replication servers. Replication servers run on small instances such as t3.small, keeping per-server monthly costs relatively low. Instances launched during recovery drills or failover are billed only for the time they run. EBS snapshot storage costs are incurred based on data volume. Important limits to note include a per-account cap on the number of source servers; large environments may require a service quota increase request. Replication bandwidth is limited to a maximum of 10 Gbps per server, so for database servers with extremely high write throughput, initial sync completion time should be estimated. Cross-Region replication also incurs inter-region data transfer charges, making it important to pre-calculate monthly costs for high-capacity servers.

Comparison with Other DR Approaches

AWS offers several methods for achieving DR besides DRS. AWS Backup provides scheduled snapshot-based backups with an RPO of approximately one hour at minimum, but features simpler setup and lower cost. CloudEndure Disaster Recovery was the predecessor to DRS, and migration to DRS is recommended. The pilot light approach keeps a minimal infrastructure configuration always running and scales up during disasters, combining with RDS or Route 53 failover. Warm standby keeps a reduced-size configuration resembling production always running, offering shorter RTO but higher cost than DRS. The strength of DRS lies in its second-level RPO through continuous replication and the elimination of always-on full-size standby environments. Conversely, if DR for the database layer alone (e.g., RDS Multi-Region Read Replica plus Route 53 failover) is sufficient, a simpler solution without DRS may be appropriate.

Design Best Practices and Pitfalls

When adopting DRS, schedule monthly recovery drills to validate recovery procedures and application functionality. Version-control scripts used in post-launch actions and verify them with drills after every change. A common pitfall is the replication agent stopping after OS patches on the source server. Build a CloudWatch alarm to detect when replication lag exceeds a threshold. Additionally, when source servers are joined to an Active Directory domain, verify DNS settings and connectivity to domain controllers in advance so that recovery instances can rejoin the domain. When using Recovery Plans, document inter-server dependencies (DB then App then Web startup order) and correctly define launch groups and wait times. Before initiating failback, confirm source site network restoration first and verify that the Failback Client can connect to the AWS staging server.

Summary

Elastic Disaster Recovery is a DR service that reduces RPO to seconds through continuous replication and enables recovery in minutes. Launch templates and Recovery Plans pre-define instance settings and launch order for recovery, and post-launch actions automate DNS switching. Regular recovery drills validate RTO/RPO targets, and point-in-time recovery provides ransomware protection. Compared to backup-based approaches, RPO is dramatically shorter, and compared to always-on standby, costs are significantly reduced, making DRS a well-balanced DR solution.

Database Migration Service - Safe and Efficient Database Migration with AWS DMSLearn how to use AWS Database Migration Service (DMS) for database migration. This guide covers homogeneous and heterogeneous database migration with RDS integration, and practical methods for minimizing downtime through continuous replication.Accelerating Data Transfer with AWS DataSync - Migration from On-Premises to S3 and EFSAutomate high-speed data transfer from on-premises to S3 and EFS. This guide covers agent deployment, task scheduling, and transfer data integrity verification.Mainframe Migration - Moving Legacy Systems to the Cloud with AWS Mainframe ModernizationLearn about mainframe cloud migration using AWS Mainframe Modernization. This article covers two migration patterns - replatforming (Micro Focus) and refactoring (Blu Age) - along with migration strategies.Migrating Mainframes to AWS with AWS Mainframe Modernization - Replatforming and RefactoringLearn about migration patterns for mainframe applications using Mainframe Modernization, and when to choose Blu Age versus Micro Focus.Accelerating Mainframe Modernization - Modernize Legacy COBOL in Months with AWS TransformLearn about mainframe modernization with AWS Transform for mainframe. This article covers automated COBOL code analysis, Java conversion, and phased migration strategies.Planning and Executing Large-Scale Migrations with AWS MGN - Wave Design and Cutover AutomationPlan migrations of hundreds of servers using wave design, automate cutover with post-launch scripts, and optimize instances after migration.Practical Lift-and-Shift Migration with AWS Application Migration Service (MGN)A practical guide to lift-and-shift migration from agent installation to cutover, covering design considerations and network configuration.Centralized Migration Management - Visualizing Cloud Migration Progress with AWS Migration HubLearn how to centrally manage cloud migrations with AWS Migration Hub, including progress visualization, Application Discovery Service integration, and migration strategy recommendations.

Overview of Elastic Disaster Recovery

Recovery Drills and Failover

Network Design and Recovery Automation

DRS Pricing and Limit Considerations

Comparison with Other DR Approaches

Design Best Practices and Pitfalls

Summary

Related Services

Related Articles

More on This Topic

Similar Articles and Services