Amazon DataZone のアイコン

Amazon DataZone New2023年〜

A data management service that unifies data discovery, sharing, and governance

What It Does

Amazon DataZone is a data governance service for discovering, sharing, and managing data across your organization. Its data catalog feature lets you search and browse data assets scattered across S3, Redshift, Glue, and other sources from a single place. It securely manages access permissions between data owners and consumers, promoting cross-departmental data utilization.

Use Cases

It is used when large enterprises want to share and leverage data held by different departments across the entire organization, or when they need to catalog and manage access to data accumulated in a data lake. It enables self-service data access where data scientists can search for and request datasets they need, then gain access through an approval workflow.

Everyday Analogy

Think of it like a corporate librarian. The librarian (DataZone) organizes data (books) held by various departments into a catalog. Users can search the catalog system to find the data they need, submit a checkout request (access request), and use the data once approved.

What Is DataZone?

Amazon DataZone is a service for unified management of data assets across your entire organization. It provides a centralized view of where data is stored, what it contains, and who can access it. It functions as a platform connecting data producers (owners) and consumers (users), supporting data-driven decision-making.

Data Catalog and Governance

DataZone's data catalog automatically scans data sources like S3 buckets, Redshift tables, and Glue Data Catalog entries, collecting metadata (data names, types, descriptions, etc.). You can define business glossaries to add meaning to data and set data quality rules, promoting a unified understanding of data across the organization.

Access Management and Collaboration

In DataZone, data owners can set access policies and build workflows to approve or deny access requests from consumers. Teams are organized into projects, and access to required datasets is managed at the project level. It integrates with Lake Formation and IAM for fine-grained access control. For practical tips on access management and collaboration, specialized books (Amazon) are also helpful.

Getting Started

To get started with DataZone, create a domain (management unit) in the DataZone console. Then register data sources and begin metadata collection. Create a project, invite team members, and start using data by searching and subscribing to datasets from the data catalog.

Things to Watch Out For

  • DataZone manages metadata and access permissions without changing where data is stored. The data itself remains in S3 or Redshift
  • Integration with Lake Formation is often a prerequisite, so completing the basic Lake Formation setup first will make things smoother
  • Design your domains and projects carefully to match your organizational structure. Changes later can have a wide impact
共有するXB!