Amazon Managed Service for Apache Flink
A fully managed service for running Apache Flink applications that processes and analyzes real-time streaming data from Kinesis Data Streams and MSK with low latency
Overview
Amazon Managed Service for Apache Flink (formerly Kinesis Data Analytics) is a fully managed stream processing service for running Apache Flink applications. Simply deploy Flink applications written in Java, Scala, or Python, and cluster provisioning, scaling, patching, and checkpoint management are all automated. It uses Kinesis Data Streams, Amazon MSK, and S3 as sources and sinks, processing events with millisecond-level latency.
Flink Application Deployment and Scaling
To deploy an application to Managed Flink, upload a Flink application JAR file or ZIP archive to S3 and specify the entry point class and runtime environment in the application configuration. Apache Flink Studio also allows interactive SQL query execution on Apache Zeppelin notebooks for real-time exploration and analysis of streaming data. Scaling is performed in KPU (Kinesis Processing Unit) increments, where 1 KPU equals 1 vCPU and 4 GB of memory. With Auto Scaling enabled, the KPU count automatically adjusts based on input data throughput. During scale-out, an application snapshot is taken and state is restored at the new parallelism level, ensuring scaling completes without data loss. The Parallelism and ParallelismPerKPU settings are key to performance tuning and should be aligned with the number of source shards or partitions.
Checkpointing and State Management
In Flink's stateful processing, intermediate state from window aggregations and session management is maintained within the application. Managed Flink periodically persists this state as checkpoints to S3, and upon failure, restores state from the checkpoint to resume processing. The default checkpoint interval is 60 seconds and can be shortened for stricter latency requirements, though shorter intervals increase checkpoint I/O overhead. The default state backend is RocksDB (disk-based), which efficiently manages large state sizes (multiple GB and above). For smaller state sizes, switching to HashMapStateBackend (memory-based) can improve latency. Savepoints are user-initiated snapshots used to carry over state during application upgrades or logic changes. In Managed Flink, you take a savepoint via the Snapshot API and start the new application version from that savepoint, achieving zero-downtime upgrades for stateful applications. For a deeper understanding of stream processing architectures, related books (Amazon) are a valuable resource.
Source and Sink Integration with Kinesis and MSK
The most common input sources for Managed Flink are Kinesis Data Streams and Amazon MSK (Managed Streaming for Apache Kafka). With a Kinesis source, Flink's Kinesis Consumer reads data in parallel per shard, and matching the shard count to Flink's parallelism achieves maximum throughput. With an MSK source, Flink's Kafka Consumer reads data from topic partitions. IAM authentication for MSK is supported, eliminating the need for access key management. Available sinks (output destinations) include S3 (batch writes in Parquet/JSON format), Kinesis Data Streams, MSK, OpenSearch, DynamoDB, and RDS. The S3 sink uses Flink's bucketing feature to partition files by time or key, arranging data so that downstream Athena queries benefit from partition pruning. Watermark strategy configuration directly impacts event-time processing accuracy - failing to properly set the allowed lateness for late-arriving data will produce inaccurate window computation results.