Amazon FSx

A fully managed service offering four file system variants - Lustre, Windows File Server, NetApp ONTAP, and OpenZFS - providing the optimal choice for each workload.

Overview

Amazon FSx is a service family that provides industry-standard file systems in a fully managed manner. It offers four variants: FSx for Lustre for high-performance computing (HPC) and machine learning, FSx for Windows File Server for Windows-based applications, FSx for NetApp ONTAP with enterprise NAS capabilities, and FSx for OpenZFS as a general-purpose POSIX-compatible file system. All variants have backups, patching, and hardware failure handling managed by AWS, freeing you from on-premises file server operations. While EFS (Elastic File System) provides a general-purpose NFS file system, FSx is positioned to offer choices optimized for specific protocols and workloads.

The Four Variants and a Selection Flowchart

The first thing to check when selecting FSx is your protocol requirements. If Windows applications require the SMB protocol or Active Directory integration, the choice narrows to FSx for Windows File Server. It supports Windows-native features such as NTFS access control lists (ACLs), DFS namespaces, and shadow copies, making it the ideal migration target from on-premises Windows file servers. If you need high throughput with S3 integration for HPC or machine learning training jobs, choose FSx for Lustre. Lustre can link an S3 bucket as a data repository and supports lazy loading, which automatically loads data from S3 on first file access. This is particularly powerful when loading large datasets at high speed for SageMaker training jobs. If you need multi-protocol support (NFS, SMB, iSCSI) or enterprise NAS features like snapshots, clones, and tiering, FSx for NetApp ONTAP is the right fit. If you have no specific protocol requirements and need a general-purpose POSIX-compatible file system, FSx for OpenZFS is a lightweight option.

Choosing Between EFS and FSx - Versatility vs. Specialization

EFS and FSx are frequently compared as AWS managed file systems. EFS is a general-purpose NFS v4 file system with automatic capacity scaling and easy mounting from Lambda and ECS. FSx, on the other hand, is optimized for specific workloads with significantly different performance characteristics. FSx for Lustre can achieve throughput of hundreds of GB/s, covering performance ranges that EFS's general-purpose mode cannot reach. On the cost side, EFS automatically optimizes through access frequency-based tiering (Standard / Infrequent Access), while FSx requires you to specify fixed storage capacity and throughput at provisioning time. FSx is more cost-efficient when I/O patterns are predictable, while EFS's pay-as-you-go model is advantageous for irregular access patterns. The corresponding Azure services include Azure Files (SMB) and Azure NetApp Files (NFS/SMB), but Azure does not offer a managed HPC file system equivalent to Lustre, making FSx for Lustre a strength unique to AWS. Related books on storage design (Amazon) cover file system selection approaches based on workload characteristics in detail.

Lustre and S3 Integration - Accelerating Data Pipelines

The defining feature of FSx for Lustre is its transparent integration with S3. When you link an S3 bucket as a data repository, S3 object metadata is automatically mapped onto the Lustre file system. Actual data is loaded from S3 to Lustre storage on first file access using a lazy loading approach, eliminating the need to copy all data in advance. To write processing results back to S3, you use the hsm_archive command or the automatic export feature. There are two deployment types: Scratch and Persistent. Scratch provides temporary high-speed storage without data redundancy, offering lower cost but risking data loss on hardware failure. It is suited for batch processing and training jobs where source data is stored in S3 and can be re-executed. Persistent provides data redundancy within an AZ and is suited for long-running workloads or retaining intermediate data that is expensive to recompute. Throughput is selectable from 50 to 1,000 MB/s/TiB, with higher per-capacity throughput settings costing more. In practice, the rational approach is to start with a low throughput setting and increase it as needed while monitoring the FileServerDiskThroughputBalance metric in CloudWatch.

共有するXB!