Automated Sensitive Data Discovery in S3 with Amazon Macie - PII Detection and Security Posture Management
Learn about Macie's sensitive data discovery for S3 buckets, custom data identifiers, and Security Hub integration.
Overview of Macie
Macie is a data security service that automatically discovers and classifies sensitive data in S3 buckets. It automatically evaluates which S3 buckets contain sensitive data and whether their security settings are appropriate. With over 100 managed data identifiers, it detects PII and credit card numbers, and custom identifiers let you handle organization-specific sensitive data as well.
Data Discovery and Security Posture
Sensitive data discovery jobs scan objects in S3 buckets using sampling or full scans, detecting PII with managed data identifiers. Japanese names, addresses, phone numbers, and national ID numbers (My Number) are also supported. Custom data identifiers are defined by combining regular expressions (e.g., employee number patterns) with keywords (e.g., "confidential"). The S3 bucket inventory displays encryption settings, public access blocks, and sharing configurations for all buckets at a glance, helping you identify high-risk buckets.
Automated Discovery and Classification
Macie's automated sensitive data discovery continuously samples and scans all S3 buckets in your account to estimate the presence of sensitive data. It costs less than full scan jobs and is well-suited for understanding the distribution of sensitive data across your organization. Managed data identifiers detect over 100 sensitive data patterns including credit card numbers, social security numbers, passport numbers, and API keys. Custom data identifiers let you define regular expressions and keywords to detect organization-specific sensitive data such as employee numbers and customer codes. Allow lists suppress false positives by excluding test data and public information. To deepen your understanding of data security, specialized books on Amazon can also be useful.
Macie Pricing
Macie pricing consists of bucket evaluation (approximately $0.10 per bucket per month) and sensitive data discovery (based on scanned data volume, approximately $1 per GB). Automated sensitive data discovery is sampling-based and significantly cheaper than full scans. A 30-day free trial lets you assess actual costs. Manage costs by limiting scan targets to buckets likely to contain sensitive data, excluding log buckets and backup buckets. Findings are aggregated in Security Hub at no additional charge.
Summary
Start by enabling automated sensitive data discovery to sample-scan all S3 buckets across your account and understand the distribution of sensitive data. Prioritize full scans on high-risk buckets (publicly accessible, unencrypted), and build automated responses to findings via EventBridge integration (blocking public access, notifying the security team). Macie is especially valuable when organizations need to understand where personal data exists to comply with GDPR or data protection laws.