AWS DataSync Specialized2018年〜
A service that automates and accelerates data transfers between on-premises and AWS
What It Does
AWS DataSync is a managed service that automates and accelerates data transfers between on-premises storage and AWS storage services (S3, EFS, FSx). Using a purpose-built network protocol, it can transfer data up to 10 times faster than open-source tools. It also supports data integrity checks during transfer and scheduled execution, enabling large-scale data migrations to be performed safely and efficiently.
Use Cases
It is used for migrating data from on-premises NAS or file servers to S3 or EFS, regularly transferring backup data to AWS, and synchronizing data in hybrid cloud environments. Even for data migration projects involving hundreds of terabytes, scheduling and automatic retries significantly reduce operational overhead.
Everyday Analogy
Think of it like hiring a moving company. Instead of carrying your belongings (data) one by one yourself, you hire professional movers (DataSync). They use specialized trucks (optimized protocols) to move your things efficiently and check that nothing is damaged (data integrity) along the way.
What Is DataSync?
AWS DataSync is a service for high-speed data transfers between on-premises and AWS. It supports various storage protocols including NFS, SMB, HDFS, and object storage. You can choose from destinations like S3, EFS, FSx for Windows File Server, and FSx for Lustre, enabling flexible data migration for different use cases.
How High-Speed Transfers Work
DataSync uses a proprietary transfer protocol that maximizes network bandwidth utilization. Optimization techniques like parallel transfers, data compression, and incremental transfers (transferring only changed data) deliver significantly faster speeds compared to open-source tools like rsync. Data integrity checks are performed automatically during transfer, so you don't need to worry about data loss or corruption.
Scheduling and Automation
DataSync supports scheduled task execution, allowing you to automate regular data synchronization on a daily or weekly basis. For example, you can easily set up nightly syncs of changes from an on-premises file server to S3. Integration with CloudWatch lets you monitor transfer progress and completion, and receive alerts when issues occur. For best practices on scheduling and automation, books on Amazon are a great resource.
Getting Started
To get started with DataSync, first deploy a DataSync agent (virtual machine) in your on-premises environment. Then configure source and destination locations in the DataSync console and create a task. When you run the task, the agent reads the data and securely transfers it to AWS.
Things to Watch Out For
- A virtual machine for the DataSync agent (compatible with VMware, Hyper-V, and KVM) is required in your on-premises environment. Check the agent resource requirements in advance
- Pricing is based on the volume of data transferred, so estimate costs before performing a large initial migration
- For transfers between AWS services (e.g., S3 to EFS), no agent is needed - you can create tasks directly from the console