Privacy-Preserving Data Collaboration with AWS Clean Rooms

Run joint analysis across multiple companies without sharing or copying data. Learn about aggregation rules for preventing individual identification and Cryptographic Computing for encrypted analysis.

Clean Rooms Overview and Use Cases

Clean Rooms is a service that enables multiple companies to jointly run analyses without sharing or copying data between each other. As GDPR and personal data protection laws become stricter, making data sharing between companies more difficult, Clean Rooms provides a means to extract value from data while preserving privacy. Representative use cases include advertising effectiveness measurement (matching advertiser and publisher data to analyze conversions), healthcare research (anonymizing patient data from multiple medical institutions for joint research), and financial risk analysis (aggregating transaction data from multiple financial institutions to build risk models).

Collaborations and Analysis Rules

A collaboration consists of participating members (companies) and their roles (data provider, analysis executor). Each member registers their data on S3 as a Glue Data Catalog table and associates it with the collaboration. Analysis rules control the types of queries allowed. Aggregation rules permit only aggregate functions such as COUNT, SUM, and AVG, prohibiting individual record output. Setting a minimum aggregation threshold (e.g., 100 or more records, configurable up to 500) eliminates the risk of identifying individuals from small record sets. List rules allow output of record lists matching specific conditions, but the output columns can be restricted.

Cryptographic Computing

Clean Rooms ML's Cryptographic Computing is a feature that runs analysis on encrypted data. Data providers register data in an encrypted state with Clean Rooms, and analysis executors run queries against the encrypted data. Only the analysis results are decrypted and returned, with raw data never exposed. This technology enables joint analysis while mathematically guaranteeing data confidentiality. Cryptographic Computing currently supports similarity matching (identifying common records between two datasets) and can be used for advertising audience matching and customer list reconciliation. To broaden your knowledge of data analytics, specialized books on Amazon can also be useful.

Clean Rooms Pricing

Clean Rooms pricing is based on query processing volume. SQL queries are billed based on the amount of data processed, at approximately $5.00 per TB. Cryptographic Computing incurs additional charges based on the volume of encrypted matching processing. There are no additional charges for creating collaborations or managing members. Restricting the types of queries allowed through analysis rules prevents unintended large-scale scans and helps manage costs. For joint analysis with partner companies, it is important to agree on cost-sharing arrangements in advance.

Summary

Clean Rooms is a privacy-preserving service that enables joint analysis without sharing data. It controls allowed queries through analysis rules and enables analysis of encrypted data through Cryptographic Computing. As privacy regulations continue to strengthen, it is becoming the new standard for data collaboration between companies.