Amazon Security Lake
A service that automatically normalizes security data from CloudTrail, VPC Flow Logs, Route 53 Resolver logs, and other sources into OCSF format, centralizing it in an S3-based security data lake
Overview
Amazon Security Lake automatically collects security-related logs and event data from AWS environments and third-party sources, normalizes them into the Open Cybersecurity Schema Framework (OCSF), and stores them in S3. In addition to AWS-native sources such as CloudTrail, VPC Flow Logs, Route 53 Resolver logs, Security Hub findings, and Lambda execution logs, it integrates data from third-party security products like CrowdStrike and Palo Alto Networks. Data is stored in Apache Iceberg table format, enabling efficient querying from Athena and OpenSearch.
OCSF Schema and Data Normalization
At the core of Security Lake is its ability to automatically normalize security data from disparate sources into OCSF (Open Cybersecurity Schema Framework), a common schema. OCSF is an open schema for standardizing security events, with unified field definitions for event categories (authentication, network, file operations, etc.), severity, actors, and resources. Whether it's a CloudTrail API call event or a VPC Flow Logs network flow, data is stored in the same schema, making cross-source correlation analysis straightforward. Data is compressed in Apache Parquet format and managed as Apache Iceberg tables. Iceberg's partition evolution feature automatically applies efficient query pruning by time range and region. Retention periods can be configured per source, and lifecycle policies can be set up to automatically tier hot data to S3 Standard and cold data to S3 Glacier.
Source Integration and Subscriber Management
Security Lake data sources fall into two categories: AWS-native sources and custom sources. AWS-native sources begin collecting data automatically once enabled, with OCSF conversion handled transparently. Custom sources ingest data from third-party security products or on-premises log sources - either by writing OCSF-formatted data directly to S3 or by importing it via Glue crawlers. Subscribers are the services or accounts that consume Security Lake data. Query access subscribers execute queries directly from Athena or OpenSearch, while data access subscribers read data from S3 and ingest it into their own SIEM platforms (Splunk, IBM QRadar, etc.). When new data arrives, subscribers are notified via SQS queues or EventBridge, enabling near-real-time analysis pipelines. For a systematic study of security operations and SIEM architecture, related books (Amazon) are a valuable resource.
Cross-Account Aggregation and Query Access
Security Lake integrates with Organizations to aggregate security data from multiple accounts into a delegated administrator account. The delegated administrator can be any account within the organization, though a dedicated security account is typically designated. Data from each member account is automatically transferred to the delegated administrator's data lake, enabling centralized security analysis. By configuring rollup regions, data from multiple regions can be consolidated into a single region, allowing global security monitoring through a single query. When querying from Athena, you use OCSF's standard fields to write SQL-based security investigation queries such as "failed authentication events in the past 24 hours where the source IP is from outside the country." Pricing consists of data ingestion charges (approximately $0.75 USD per GB) plus S3 storage costs, with query charges billed at standard Athena rates.