Data Governance with Amazon DataZone - Data Discovery, Sharing, and Access Control
Learn how to build a domain-based data catalog and implement data discovery, sharing, and access control through subscription workflows.
DataZone Overview
DataZone is a service that unifies data discovery, sharing, and governance within an organization, capable of managing thousands of data assets and hundreds of users. Data producers publish data assets to the catalog, and data consumers search the catalog for needed data and submit subscription requests. After approval, consumers can access data directly from Athena or Redshift.
Domains and Subscriptions
Domains are logical groups corresponding to business units or teams, clarifying data ownership and management responsibility. In the subscription workflow, consumers request access to data assets, and producers or administrators approve the requests. After approval, Lake Formation permissions are automatically granted, and consumers can execute queries from Athena.
Data Quality and Catalog Management
DataZone's data quality rules automatically validate the quality of published data assets. Define rules for completeness (percentage of NULL values), uniqueness (duplicate records), and freshness (last update date), and display quality scores in the catalog. The business glossary manages organization-wide term definitions, and by tagging data assets, you can search for data by business meaning rather than just technical table names. Metadata forms define custom attributes such as data owner, update frequency, and sensitivity level, attaching governance-relevant information to data assets. Integration with the Glue Data Catalog enables automatic import of existing table definitions into DataZone. For a comprehensive understanding of DataZone design patterns, refer to technical books (Amazon).
DataZone Pricing
DataZone pricing consists of the number of data assets registered in the catalog and metadata API call volume. Data assets cost approximately $0.10 per asset per month, and metadata API calls cost approximately $4.25 per million requests. Subscription approval and management are available at no additional charge. In large organizations where data assets can reach thousands, manage costs through regular inventory of unnecessary assets. Leveraging existing metadata through Glue Data Catalog integration avoids duplicate catalog management and reduces operational costs.
Summary
DataZone is a service that unifies data discovery, sharing, and governance to maximize data value across the organization. Domain-based ownership management clarifies data responsibility, and subscription workflows enable approval-based data sharing. Data quality rules and business glossaries improve catalog reliability and searchability.