Building a Data Lake with AWS Lake Formation - Fine-Grained Access Control and Data Catalog
Establish data lake governance with column-level and row-level fine-grained access control and tag-based management. This article covers integration with Glue Data Catalog and cross-account access.
Overview of Lake Formation
Lake Formation is a service that simplifies building, managing, and securing data lakes on S3. Traditionally, access control for S3 data lakes was managed through a combination of S3 bucket policies and IAM policies, making column-level and row-level control difficult. Lake Formation manages permissions at four levels - database, table, column, and row - and scales to data lakes with thousands of tables. It applies uniformly to queries from Athena, Redshift Spectrum, and EMR.
Access Control and Tag-Based Management
Lake Formation's permission model grants SELECT, INSERT, and DELETE permissions on databases, tables, and columns to principals (IAM users and roles). Row-level filters restrict access to only rows matching specific conditions, enabling controls such as allowing each department to view only its own data. LF-TBAC lets you attach tags to data (e.g., sensitivity=high) and set tag-based permissions for principals. When new tables are added, permissions are automatically applied if the tags match.
Cross-Account Access and Auditing
Lake Formation's cross-account sharing grants access permissions to tables and databases for other accounts within your Organizations. Using RAM (Resource Access Manager), you create resource links to display shared tables in the consumer account's Glue catalog. Data filters define row-level and cell-level access controls, allowing you to expose different data ranges from the same table to different departments. Integration with CloudTrail records audit logs of who accessed which columns of which tables. Lake Formation's tag-based access control (LF-TBAC) attaches tags to data assets and automatically grants permissions by matching principal tags, which is efficient for large-scale environments. For a comprehensive study of Lake Formation design patterns, refer to technical books on Amazon.
Lake Formation Pricing and Operations
Lake Formation itself incurs no additional charges. Costs depend on the underlying S3 storage, Glue crawlers and jobs, and Athena query usage. When using Lake Formation's transaction feature (Governed Tables), additional charges apply for transaction API calls and storage. Operational costs for permission management are significantly reduced by adopting LF-TBAC, which frees you from managing individual table and column permissions. We recommend registering data locations to place S3 paths under Lake Formation management and fully migrating from hybrid mode, which requires dual management of IAM policies and Lake Formation permissions.
Summary
Lake Formation provides fine-grained access control for S3 data lakes. Column-level and row-level permission management combined with LF-TBAC (tag-based access control) streamlines permission management in large-scale environments, while cross-account sharing enables secure data sharing within Organizations. CloudTrail integration records data access audit logs, building the foundation for data governance.