Running Apache Airflow as a Managed Service with Amazon MWAA - DAG Design and Workflow Automation

Learn how to set up an Airflow environment with MWAA, design DAGs, integrate with S3, and leverage AWS operators for workflow automation.

Overview of MWAA

MWAA is a managed workflow orchestration service that runs Apache Airflow 2.x, scaling up to 25 workers. While Step Functions is suited for event-driven state transitions, Airflow is better suited for schedule-based complex data pipelines (ETL, ML pipelines, report generation).

DAGs and AWS Operators

DAGs are defined in Python, with task dependencies described using the >> operator. You can intuitively build pipelines like extract >> transform >> load. Uploading Python files to the dags/ folder in S3 automatically registers them with the scheduler. AWS operators integrate AWS services as tasks: EcsRunTaskOperator runs ECS tasks, LambdaInvokeFunctionOperator invokes Lambda functions, and GlueJobOperator starts Glue jobs.

Environment Design and Plugins

MWAA environments select worker resources by class size (mw1.small, mw1.medium, mw1.large). You set minimum and maximum worker counts, and auto-scaling adjusts based on DAG parallelism. Use requirements.txt to add Python packages and plugins.zip to deploy custom operators and hooks. Uploading DAG files to the S3 bucket automatically reflects them in the environment. The Airflow Web UI can be exposed in private or public network access mode, with access controlled through IAM authentication. To broaden your knowledge of service integrations, specialized books on Amazon can also be useful.

MWAA Pricing

MWAA pricing consists of environment uptime and worker execution time. An mw1.small environment costs approximately $0.49 per hour (about $353/month). Additional workers cost approximately $0.055 per hour. Compared to Step Functions (approximately $0.025 per 1,000 state transitions), MWAA has higher always-on environment costs, so Step Functions is more cost-efficient when DAG execution frequency is low. Choose MWAA when you need complex dependency management or need to migrate existing Airflow DAGs.

Summary

MWAA is a managed workflow orchestration service that provides Apache Airflow. You define DAGs in Python and integrate with AWS services through AWS operators (Glue, EMR, ECS, Lambda). Custom packages and operators are added via requirements.txt and plugins.zip, and DAG uploads to S3 are automatically reflected.