Building a Security Data Lake with Amazon Security Lake - Unified Analysis in OCSF Format
Learn about Security Lake's automatic aggregation of CloudTrail, VPC Flow Logs, and Route 53 logs, OCSF normalization, and integration with subscribers.
Overview of Security Lake
Security Lake is a service that automatically aggregates and normalizes security data from AWS and third-party sources. Previously, security analysis required collecting and transforming CloudTrail logs, VPC Flow Logs, and GuardDuty findings individually, but Security Lake automatically converts these into OCSF format and aggregates them into an S3-based data lake. Data is stored in Apache Iceberg table format and can be queried directly with SQL from Athena.
Data Sources and OCSF Normalization
Security Lake automatically collects 8 types of AWS-native data sources: CloudTrail management events, CloudTrail data events, VPC Flow Logs, Route 53 Resolver logs, Security Hub, Lambda execution logs, EKS audit logs, and WAF logs. Third-party data sources (CrowdStrike, Palo Alto Networks, etc.) can also be added as custom sources. OCSF is an open framework that converts security events from different sources into a unified schema, enabling queries using the same column names and data types regardless of source.
Subscribers and Analysis
Subscribers are consumers that access data in the data lake. Data access subscribers can query data directly on S3, analyzing it with Athena or Redshift Spectrum. Query access subscribers receive SQS notifications when new data arrives, enabling real-time analysis pipelines. SIEM tools such as Splunk and Datadog can be configured as subscribers to integrate Security Lake data into existing security operations tools. To deepen your understanding of Security Lake, specialized books on Amazon can also be helpful.
Security Lake Pricing
Security Lake pricing consists of data ingestion volume and storage volume. Data ingestion from AWS-native sources costs approximately 0.75 USD per GB, with S3 storage charges applied separately. Since data is stored in Apache Iceberg format, Athena query costs are based on S3 scan volume (approximately 5 USD per TB). Set data retention periods per region and tier older data to Glacier automatically to reduce storage costs. When enabling across an entire Organization, a phased rollout starting with high-log-volume accounts and monitoring costs along the way is recommended.
Summary
Security Lake is a data lake service that automatically aggregates AWS security data in OCSF format. Organizations integration centralizes security data across the entire organization, enabling cross-cutting analysis with Athena and SIEM tools. It is particularly effective as a security operations foundation for large-scale organizations.