Amazon DataZone
A data management portal that integrates data cataloging, access governance, and self-service data sharing
Overview
Amazon DataZone is a data management service for discovering, sharing, and governing data assets within an organization. It provides a portal accessible to both business users and data engineers, enabling self-service data search, access requests, and approval workflows. In addition to AWS data sources such as S3, Redshift, Glue Data Catalog, and RDS, it integrates third-party data sources, providing a unified catalog of data assets across the entire organization. Through integration with Lake Formation, it automatically applies fine-grained access controls at the table and column level.
Data Governance Through Domains and Projects
DataZone's governance structure is designed in three tiers: domains, projects, and environments. A domain is the top-level governance boundary, typically created per enterprise or business unit. Within a domain, the business data catalog, business glossary, and metadata forms are stored, forming the foundation for classifying and contextualizing data assets. Projects are created per team of data producers and consumers, managing membership and role-based access control. Environments are collections of technical resources tied to a project, defining connections to Redshift clusters, Athena workgroups, Glue databases, and more. Data owners publish data assets within a project, and members of other projects acquire access through subscription requests. The approval workflow is customizable, with options for automatic approval, data owner approval, and administrator approval stages.
Publishing Data Assets and Subscriptions
To share data in DataZone, you first register data sources and automatically collect metadata. When you connect Glue Data Catalog tables, Redshift schemas, or S3 datasets as data sources, table names, column definitions, data types, and statistics are automatically ingested into the catalog. Data owners enrich the ingested assets with business metadata (descriptions, tags, business terms, and data quality rules) and perform a Publish operation to make them searchable by other projects within the domain. Data consumers discover assets through the portal's search interface and submit subscription requests. Once approved, Lake Formation access permissions are automatically granted, allowing consumers to query data directly from their project environment (Athena or Redshift). This end-to-end flow dramatically reduces the operational burden of data engineers manually configuring IAM policies and Lake Formation permissions. For a deeper understanding of data governance and catalog design, related books (Amazon) are a valuable resource.
Business Data Catalog and Search Experience
DataZone's business data catalog is a searchable inventory of data assets that integrates technical metadata with business context. The catalog features natural language search, so when business users search for terms like "monthly sales data" or "customer segments," relevant tables and datasets are surfaced. The business glossary feature lets you register organization-specific terminology (KPI definitions, metric calculation methods, data classification criteria, etc.) and link them to data assets, unifying the meaning and context of data across the organization. Metadata forms define custom fields, allowing you to attach attributes such as data freshness, update frequency, data owner, and sensitivity level to assets. Through integration with data quality rules, each asset's quality score is displayed in the catalog, enabling consumers to evaluate data reliability before use. Activity logs from the catalog reveal which data assets are frequently used and which projects consume data, supporting data asset value assessment.