Accelerating Data Transfer with AWS DataSync - Migration from On-Premises to S3 and EFS

Automate high-speed data transfer from on-premises to S3 and EFS. This guide covers agent deployment, task scheduling, and transfer data integrity verification.

DataSync Overview

DataSync is a service that automates and accelerates data transfer between on-premises and AWS, achieving throughput of up to 10 Gbps per task. It delivers transfer speeds up to 10 times faster than rsync or robocopy, maximizing network bandwidth utilization. It is used for data migration from on-premises NFS/SMB file servers to S3, EFS, and FSx for Windows.

Agent and Task Design

Deploy a DataSync agent (a virtual machine on VMware ESXi, Hyper-V, or KVM) on-premises and connect it to the source storage. In the task, specify the source location (on-premises NFS) and destination location (S3 bucket), and configure transfer options (file metadata preservation, exclude filters). Schedule execution to automate daily differential transfers and continuously synchronize data between on-premises and AWS. After transfer completion, data integrity is automatically verified, and any checksum mismatches are reported.

Inter-AWS Transfer and Scheduling

DataSync also supports data transfer between AWS services. You can execute transfers between S3 buckets, between EFS file systems, between FSx file systems, and from S3 to EFS without an agent. It supports cross-account and cross-region transfers, making it useful for data replication to DR environments. Task schedules automate periodic transfers, synchronizing data through daily or weekly incremental transfers. Transfer filters narrow down targets based on file name patterns or modification dates, avoiding unnecessary data transfers. For understanding DataSync best practices, related books (Amazon) can be helpful.

DataSync Pricing

DataSync pricing is based on the volume of data copied, at approximately $0.0125 per GB. Transferring from on-premises requires an agent execution environment (VM), but the agent itself incurs no additional charges. Inter-AWS transfers follow the same per-data-volume pricing. Storage costs at the destination (S3, EFS, FSx) are charged separately. Optimize costs by using differential transfer to transfer only changed files and minimize data volume. For large initial transfers, consider combining with AWS Snowball.

Summary

DataSync is a service that accelerates and automates data transfer from on-premises to AWS and between AWS services. It efficiently synchronizes only changed files through differential transfer and automates periodic transfers with schedule settings. Transfer filters narrow down targets based on file name patterns and modification dates, and it supports cross-region and cross-account transfers.