Amazon MWAA Specialized2020年〜
A managed workflow orchestration service for running Apache Airflow
What It Does
Amazon MWAA (Managed Workflows for Apache Airflow) is a fully managed service for Apache Airflow. It handles scheduled execution of workflows defined as DAGs (Directed Acyclic Graphs), dependency management between tasks, and execution monitoring. Airflow's Web UI, CLI, and API are all available as-is.
Use Cases
Used for ETL pipeline orchestration, machine learning pipeline management, data ingestion workflows to data lakes, and automating batch processing that coordinates multiple AWS services.
Everyday Analogy
Think of it like a factory production management system. It defines the order and dependencies of each process (task) and automatically executes them on schedule. If a process fails, it stops subsequent processes and handles retries and notifications.
What Is MWAA?
Amazon MWAA is a managed service for Apache Airflow. Airflow is an open-source workflow orchestrator that defines DAGs (workflows) in Python, widely used in data engineering. MWAA handles the building and operation of Airflow infrastructure (web server, scheduler, workers, metadata DB).
DAGs and AWS Service Integration
DAG files are placed in an S3 bucket. MWAA automatically loads DAGs and executes them on schedule. Airflow's AWS provider package makes it easy to integrate with AWS services - running Glue jobs, launching EMR clusters, invoking Lambda, executing ECS tasks, and more. Custom Python packages can also be added via requirements.txt. To deepen your practical knowledge of DAGs and AWS service integration, specialized books on Amazon are helpful.
Getting Started
Create an environment in the MWAA console and specify the S3 bucket for storing DAG files. Select an environment class (mw1.small, mw1.medium, mw1.large) and configure the VPC and subnets. Environment creation takes about 25 minutes. Once created, access the Airflow Web UI to manage DAGs and check execution status.
Things to Watch Out For
- Hourly charges apply continuously for the environment (minimum configuration is about $0.49/hour). Step Functions is more cost-effective for infrequent use
- Choose Step Functions for simple workflows, and MWAA when complex DAGs or the Airflow ecosystem are needed