Amazon Athena のアイコン

Amazon Athena Popular2016年〜

A serverless query service that lets you analyze data in S3 using standard SQL

What It Does

Amazon Athena is a serverless analytics service that lets you run standard SQL queries directly against data stored in S3. There is no need to set up database servers or move your data - you can analyze it right where it sits in S3. It supports a wide range of data formats including CSV, JSON, Parquet, and ORC, and you pay only for the amount of data scanned.

Use Cases

Used for analyzing log files stored in S3, investigating CloudTrail audit logs, identifying trends from access logs, aggregating business data in a data lake, and running ad-hoc data analysis. It is ideal whenever you want to quickly analyze data already in S3.

Everyday Analogy

Think of it like a library catalog search system. You do not need to move the vast collection of books (S3 data) to another location. Just enter your search criteria into the terminal (Athena) and find the information you need instantly. You only pay for the searches you run, with no maintenance cost for the terminal itself.

What Is Athena?

Amazon Athena is a serverless interactive query service that uses S3 as its data source. Under the hood, it uses the Apache Presto (Trino) engine, enabling fast queries even against large datasets. There is no server provisioning or management required - just write SQL and start analyzing your data.

Key Features

Athena's greatest strength is the ability to analyze data already stored in S3 without any changes. There is no need for ETL (Extract, Transform, Load) processing to move data into a separate database. Pricing is based solely on the amount of data scanned, at roughly $5 per TB. By converting your data to columnar formats like Parquet or ORC and applying partition splitting, you can significantly reduce scan volume to optimize both cost and speed.

Integration with Data Catalog

Athena works with the AWS Glue Data Catalog to manage the schema (structure) of your data in S3. Using Glue Crawlers, you can automatically scan S3 data and create table definitions. Once a table is defined, you can search data using SELECT statements just like a regular database. For more details on Data Catalog integration, reference books on Amazon provide thorough coverage.

Getting Started

To start using Athena, open the query editor in the Athena console. First, configure an S3 bucket to store your query results. Next, define the location and schema of your S3 data using a CREATE TABLE statement. Once the table is created, you can query data with SELECT statements. You can run queries directly from the AWS Management Console and download results as CSV files.

Things to Watch Out For

  • Pricing is based on the amount of data scanned, so use partition splitting and columnar formats to optimize costs
  • Complex queries on large datasets may take longer to execute. For regular analytics workloads, Redshift may be a better fit
  • Query results are stored in S3, so keep in mind the storage costs for the results bucket as well
共有するXB!