Amazon Kinesis Popular2013年〜
A service for collecting, processing, and analyzing real-time streaming data
What It Does
Amazon Kinesis is a family of services for collecting, processing, and analyzing real-time streaming data at scale. Kinesis Data Streams handles stream data collection and processing, Kinesis Data Firehose delivers stream data to S3, Redshift, and other destinations, and Kinesis Data Analytics provides SQL/Flink-based analysis of stream data.
Use Cases
Used for real-time processing of sensor data from IoT devices, website clickstream analysis, real-time log aggregation and analysis, real-time financial transaction monitoring, and game player behavior analysis.
Everyday Analogy
Think of it like a conveyor belt. Products (data) flowing continuously from a factory (data source) are carried on a conveyor belt (stream), where they undergo inspection (processing), sorting (analysis), and packaging (storage) in an assembly-line fashion.
What Is Kinesis?
Amazon Kinesis is a family of services for real-time data processing. While batch processing 'accumulates data and processes it all at once,' Kinesis enables stream processing that 'processes data the moment it's generated.' It has the scalability to handle millions of records per second.
Kinesis Data Streams and Firehose
Kinesis Data Streams is a service for processing stream data with custom applications. You control throughput by adjusting the number of shards and process data with Lambda or custom consumers. Kinesis Data Firehose is a service that automatically delivers stream data to S3, Redshift, OpenSearch, Splunk, and more. You can configure data transformation and delivery without writing code. To deepen your practical knowledge of Kinesis Data Streams and Firehose, technical books on Amazon are helpful.
Getting Started
Select 'Create data stream' in the Kinesis console and specify a stream name and capacity mode (on-demand or provisioned). Send data using the AWS SDK's PutRecord API and configure Lambda as a consumer to complete your real-time processing pipeline. With Firehose, you can accumulate data in S3 without writing any code.
Things to Watch Out For
- Data Streams retention period defaults to 24 hours and can be extended up to 365 days, but costs increase with longer retention
- Choose Firehose for simple data delivery, and Data Streams when custom processing is needed