AWS Clean Rooms

A service that enables multiple organizations to perform joint analysis without exposing each other's raw data, balancing privacy protection with analytics

Overview

AWS Clean Rooms is a service that allows multiple companies or organizations to run collaborative data analysis without directly sharing their datasets. By creating a collaboration environment and having each participant define analysis rules on their data (such as which columns can be used as join keys and the minimum aggregation granularity), insights can be derived while preventing raw data leakage. It also offers advanced privacy controls including differential privacy and cryptographic computing.

How Data Clean Rooms Resolve the Privacy-Analytics Dilemma

There are many insights that can only be obtained by cross-referencing data from multiple organizations, such as advertising effectiveness measurement, medical research, and financial risk analysis. However, with the tightening of regulations like data protection laws and GDPR, directly sharing raw data has become increasingly risky. Clean Rooms solves this dilemma through pre-defined analysis rules. Data providers set analysis constraints (Analysis Rules) on their tables, such as "allow joins but prohibit individual record output" or "only return aggregated results for groups of 100 or more." The analysis executor can only run SQL queries within these constraints, and queries that violate the rules are automatically blocked by the service. Data remains in each participant's AWS account throughout the analysis, so no data copying or movement occurs. This design makes it easier to obtain legal department approval for cross-organization analytics.

Collaboration Design and Practical Configuration of Analysis Rules

A Clean Rooms collaboration consists of three elements: participants (members), configured tables, and analysis rules. Members have roles as data providers and analysis executors, and a single organization can serve both roles. Configured tables reference Glue Data Catalog tables and define which columns to expose and which columns can be used as join keys. There are two types of analysis rules: Aggregation and List. Aggregation rules specify which columns can be used in GROUP BY, which aggregate functions (COUNT, SUM, AVG) are allowed, and the minimum number of output rows. List rules allow outputting only specific columns of records that match join conditions. For a thorough exploration of the legal and technical background of clean room technology, books on data privacy (Amazon) are a great resource. In practice, common use cases include advertisers and publishers performing audience overlap analysis in a cookieless environment, and pharmaceutical companies jointly conducting statistical analysis of patient cohorts.

Advanced Protection Through Differential Privacy and Cryptographic Computing

Analysis rules alone leave a residual re-identification risk where individuals could be identified through clever combinations of queries. Clean Rooms' differential privacy feature adds mathematically controlled noise to query results, quantitatively limiting the impact that any individual record's presence has on the output. A privacy budget (epsilon) is set, and once the budget is exhausted, no further queries can be executed, preventing cumulative information leakage. For cases requiring even stronger protection, Cryptographic Computing is available. This technology uses the C3R (Cryptographic Computing for Clean Rooms) client to pre-encrypt data and perform joins and aggregations on the encrypted data. Since even the Clean Rooms service itself cannot access plaintext data, it meets the most stringent data protection requirements. Google's Ads Data Hub and Azure's Confidential Ledger also offer similar privacy-preserving analytics, but Clean Rooms stands out for its seamless integration with the AWS data analytics ecosystem (Athena, Glue, S3).

共有するXB!