CloudTrail Sees Everything - How API Calls Are Recorded and Practical Forensic Investigation
This article explains how CloudTrail records AWS API calls, the differences between management events and data events, SQL analysis with CloudTrail Lake, and forensic investigation techniques when a security incident occurs.
What Does CloudTrail Record?
CloudTrail is a service that records nearly every API call made within an AWS account. All API calls are recorded, whether they come from the AWS Management Console, CLI, SDK, or other AWS services, including EC2 instance launches, S3 bucket creation, IAM policy changes, and Lambda function deployments. Each event records who (userIdentity), when (eventTime), from where (sourceIPAddress), what action (eventName), against which resource (resources), and what the result was (errorCode). CloudTrail events fall into two categories. Management Events are control plane operations such as creating, modifying, and deleting resources. They are recorded by default, and the past 90 days can be viewed for free from the console. Data Events are data plane operations such as S3 object reads and writes, Lambda function invocations, and DynamoDB item operations. Because data events are generated in large volumes, they are not recorded by default and must be explicitly enabled.
How CloudTrail Event Delivery Works
CloudTrail events are typically delivered within 5 to 15 minutes after an API call is made. It is important to note that this is not real-time. There are three delivery destinations for events. First, the CloudTrail console (Event History), where the past 90 days of management events can be viewed for free with filtering and search capabilities. Second, an S3 bucket. When you create a trail, events are delivered to an S3 bucket in JSON format. Logs stored in S3 can be analyzed by running SQL queries with Athena. Third, CloudTrail Lake. Introduced in 2022, CloudTrail Lake stores events in a dedicated data store and allows direct SQL queries. Compared to the S3 + Athena combination, setup is simpler and query performance is faster. CloudTrail logs include a tamper detection feature. When Log File Integrity Validation is enabled, the hash value of each log file is recorded in a digest file, allowing you to cryptographically verify that logs have not been tampered with.
Practical Forensic Investigation - Tracking Unauthorized Access
When a security incident occurs, CloudTrail is the most important source of evidence. For example, here is the investigation procedure when you receive an alert that "an unknown EC2 instance has been launched." First, search CloudTrail for RunInstances events to identify who launched the instance and when. The userIdentity field reveals the IAM user name, role name, and access key ID. Next, search CloudTrail using that access key ID to uncover other API calls made with the same credentials. The full picture of the attacker's actions emerges, including S3 bucket access, IAM policy changes, and security group modifications. The sourceIPAddress field can also identify the attacker's IP address, although if they are using a VPN or Tor, the IP address alone may not be sufficient to identify the attacker. The userAgent field contains information about the tools used (AWS CLI, SDK, console, etc.), providing clues to infer the attack methodology.
What CloudTrail Does Not Record
CloudTrail is powerful, but it does not record everything. Knowing what is not recorded is important for security design. First, OS-level operations. Even if you SSH into an EC2 instance and manipulate files, this is not recorded in CloudTrail. OS-level auditing requires CloudWatch Agent or AWS Systems Manager Session Manager session logs. Second, S3 object operations when data events are not enabled. Even if files are downloaded from an S3 bucket, they will not be recorded if data events are disabled. Be sure to enable data events for buckets containing sensitive data. Third, some read-only APIs. Read APIs like DescribeInstances are recorded as management events, but because they are called in large volumes, their display may be omitted in the CloudTrail console. They are fully recorded in logs delivered to S3. Fourth, some internal service-to-service communications within AWS. When AWS services internally call other services, some of those calls are not recorded in CloudTrail.
CloudTrail Cost Optimization
The first trail for management events is free. The second and subsequent trails are charged at $2.00 per 100,000 events. Data events cost $0.10 per 100,000 events. In large-scale environments, S3 data events can generate enormous volumes, resulting in monthly costs of several thousand dollars. There are several ways to optimize costs. First, narrow the scope of data events. Instead of recording data events for all S3 buckets, limit them to buckets containing sensitive data. Second, set an appropriate retention period for CloudTrail Lake. The default of 7 years is excessive for many cases. Shortening it to 1 or 3 years based on compliance requirements can reduce storage costs. Third, manage the retention period of logs delivered to S3 using S3 lifecycle rules. Setting a rule to transition to Glacier after 90 days and delete after 1 year can significantly reduce storage costs. For a systematic approach to audit log design and operations, specialized books on Amazon are a helpful reference.