Amazon MSK

A fully managed streaming service that provides Apache Kafka cluster provisioning and operations on AWS, with full Kafka-compatible API support

Overview

Amazon MSK (Managed Streaming for Apache Kafka) is a streaming service that lets you run fully managed Apache Kafka broker clusters on AWS. It is fully compatible with the open-source Kafka API, allowing you to migrate existing Kafka producer and consumer applications without code changes. MSK Serverless fully automates broker provisioning and scaling, while MSK Connect enables managed execution of Kafka Connect connectors to simplify integration with databases and S3.

Key Considerations for Broker Design and Partition Strategy

With MSK provisioned clusters, you need to determine the broker instance type, count, and storage size yourself. The basic rule is to set the broker count as a multiple of the number of Availability Zones (AZs) - for a 3-AZ configuration, 3, 6, or 9 brokers are recommended. The partition count is the most critical parameter as it determines throughput parallelism, and the principle is to match it to the maximum parallelism within a consumer group. Partition counts can be increased later but cannot be decreased, so initial design requires balancing headroom for future traffic growth without over-provisioning. Keep the replication factor at the default of 3 and set min.insync.replicas to 2, ensuring writes can continue without data loss even when one broker fails. Storage uses EBS gp3 volumes, expandable up to 16 TiB per broker.

Reducing Operational Overhead with MSK Serverless and MSK Connect

MSK Serverless is a compelling option when you want to eliminate the effort of capacity planning and scaling brokers. It automatically scales based on topic write/read throughput, removing the need to think about broker instance types or storage capacity. However, it does not allow full freedom to tune all Kafka configuration parameters, and there are constraints on partition count limits and consumer group counts, so provisioned clusters offer more flexibility for large-scale streaming platforms. MSK Connect is a service that runs Kafka Connect connectors in a managed environment - you can set up CDC (Change Data Capture) with Debezium or data lake exports with S3 Sink Connector simply by uploading connector JAR files. Azure offers Azure Event Hubs with a Kafka protocol-compatible endpoint, but Event Hubs has its own partition management model, so Kafka replication factor behavior and consumer group rebalancing mechanics differ. Books on Apache Kafka (Amazon) provide detailed practical guidance on partition design and connector operations.

Selection Criteria - MSK vs. Kinesis Data Streams

When building real-time streaming on AWS, choosing between MSK and Kinesis Data Streams is a common design decision. Kinesis is an AWS-native service with seamless integration with Lambda and Firehose, and its per-shard pay-as-you-go pricing makes it easy to start small. MSK, on the other hand, lets you leverage the rich Kafka ecosystem of connectors and stream processing frameworks (Kafka Streams, ksqlDB) as-is, resulting in a lower learning curve for teams with existing Kafka expertise. The decision comes down to three main factors. First, if you have existing Kafka clients or connector assets, MSK minimizes migration costs. Second, if you prioritize direct Lambda integration or a serverless architecture, Kinesis offers better affinity. Third, if message retention period matters, Kinesis defaults to 24 hours (up to 365 days) while MSK can retain messages indefinitely within storage capacity limits, giving MSK an advantage for event sourcing patterns. On cost, Kinesis on-demand mode is cheaper at low throughput, but at high throughput, MSK's fixed broker costs tend to be more favorable on a per-gigabyte basis.

共有するXB!