Privacy-Preserving ML with AWS Clean Rooms ML - Build Models Without Sharing Data
Learn how to build lookalike models with Clean Rooms ML, apply differential privacy, and leverage the results for ad targeting.
Overview of Clean Rooms ML
Clean Rooms ML is a service that lets you build ML models while preserving privacy within Clean Rooms, supporting datasets with millions of records. Advertisers and publishers can jointly build lookalike models and generate similar-user segments without directly viewing each other's data. Differential privacy techniques provide mathematical guarantees that individual data is protected, while maximizing marketing effectiveness.
Lookalike Models and Differential Privacy
A lookalike model is an ML model that identifies new users who resemble existing high-value customers. The advertiser provides a list of converted users (seed data), and the model extracts similar users from the publisher's audience data. Differential privacy adds noise to the model's output, mathematically guaranteeing that individual-level data cannot be inferred. The resulting lookalike segments are used for ad campaign targeting to improve conversion rates.
Designing a Collaboration
In a Clean Rooms collaboration, you define the scope of data each participant provides and the analysis rules. Analysis rules specify the types of queries allowed (aggregation only, whether list output is permitted) and the minimum aggregation unit (e.g., only aggregations of 100 or more records), preventing individual-level data extraction. For ML model building, the advertiser provides seed data (converted users), which is matched against the publisher's audience data to generate lookalike segments. You adjust the differential privacy epsilon value to control the trade-off between privacy protection strength and model accuracy. Output results are returned only at an aggregated level that cannot identify individuals, in accordance with the collaboration's analysis rules. You can also explore practical approaches to Clean Rooms in related books on Amazon.
Clean Rooms ML Pricing
Clean Rooms pricing is based on the volume of data scanned per query. ML model building incurs additional charges, with separate fees for lookalike model training and segment generation. Enabling differential privacy adds computational costs for noise injection. It is common to split costs between collaboration participants, with the party executing queries bearing the processing charges. For large datasets, you can optimize costs by narrowing the analysis period and columns to reduce scan volume. Storing data in S3 using partitioned Parquet format improves query scan efficiency.
Summary
Clean Rooms ML is a service that builds privacy-preserving ML models without sharing data. The differential privacy epsilon value controls the trade-off between protection strength and model accuracy, while analysis rules prevent individual-level data extraction. It enables safe generation of lookalike segments between advertisers and publishers without direct data sharing.