Amazon MSK のアイコン

Amazon MSK Popular2018年〜

A fully managed Apache Kafka service for real-time streaming data processing

What It Does

Amazon MSK (Managed Streaming for Apache Kafka) is a fully managed service for easily building and operating Apache Kafka in the cloud. AWS handles all Kafka cluster provisioning, configuration, patching, and failure response, so developers can focus on building streaming applications. It's fully compatible with open-source Apache Kafka, allowing existing Kafka applications to migrate without code changes.

Use Cases

Used for real-time log aggregation and analysis, clickstream data collection and processing, IoT sensor data streaming, event-driven architectures between microservices, database change data capture (CDC), and real-time fraud detection systems - anywhere large volumes of streaming data need to be processed with low latency.

Everyday Analogy

Think of it like a highway toll plaza. As a massive flow of cars (data) streams through continuously, the toll plaza (Kafka broker) organizes them into the appropriate lanes (topics). Amazon MSK handles all the construction, maintenance, and expansion of the toll plaza, so you can focus solely on designing the traffic flow.

What Is Amazon MSK?

Amazon MSK is a managed service for Apache Kafka. Apache Kafka is an open-source platform widely used for building real-time streaming data pipelines, but self-managing it requires significant operational work including cluster setup, ZooKeeper management, broker monitoring, and patching. MSK delegates all this operational burden to AWS and provides a 99.9% availability SLA. Since you can use Kafka's native APIs as-is, migrating existing producer and consumer applications is straightforward.

MSK Serverless and MSK Provisioned

Amazon MSK offers two deployment options. MSK Provisioned is the traditional cluster configuration where you explicitly specify broker instance types and storage capacity. It's suited for workloads with predictable characteristics. MSK Serverless fully automates capacity management, scaling up and down based on traffic. It's easy to set up initially and ideal for variable workloads or starting small. Both options have equivalent compatibility with open-source Kafka. To organize the concepts and approaches of MSK Serverless and MSK Provisioned, reference books on Amazon are handy.

Security and Integration

Amazon MSK provides multi-layered security features. It supports communication encryption (TLS), data-at-rest encryption (AWS KMS), IAM authentication, SASL/SCRAM authentication, and fine-grained access control via Apache Kafka ACLs. Placing the cluster within a VPC also enables network-level isolation. Integration with AWS Glue Schema Registry centralizes schema management, and data integration with Amazon S3, Amazon Redshift, OpenSearch Service, and more can be easily built using MSK Connect.

Things to Watch Out For

  • Compared to Kinesis Data Streams, MSK is better suited when you want to leverage the existing Kafka ecosystem (Connect, Streams, ksqlDB) as-is
  • MSK Serverless has partition count limits, so consider MSK Provisioned for large-scale workloads
共有するXB!