Customer Identity Unification - Resolving Scattered Customer Data with AWS Entity Resolution

Learn how to perform entity resolution (record matching) on customer data using AWS Entity Resolution. This article covers ML-based matching, rule-based matching, privacy protection, and integration with Clean Rooms.

The Challenge of Customer Data Matching

Enterprise customer data is scattered across multiple systems such as CRM, e-commerce sites, call centers, and marketing tools. The same customer may be registered with different representations across systems (e.g., "John Smith" vs. "J. Smith," or "123 Main Street" vs. "123 Main St."), and linking these records to the same individual - known as entity resolution - has been a longstanding challenge. Traditionally, organizations had to build their own exact-match or fuzzy-matching logic, but achieving both precision and recall was difficult given the wide variety of name variations. AWS Entity Resolution is a managed service that performs customer data matching using either ML-based or rule-based approaches.

Matching Methods and Configuration

Entity Resolution offers two matching methods. ML matching uses AWS ML models to holistically evaluate attributes such as name, address, phone number, and email address, calculating the probability that records refer to the same entity. It automatically handles spelling variations, abbreviations, and format differences, eliminating the need for manual rule definitions. Rule-based matching lets you define business rules for precise control. For example, you can combine conditions like "email address exact match AND name similarity above 80%" or "phone number match AND address prefecture match." Input data references S3 data sources via the Glue Data Catalog. Schema mapping associates input data columns with Entity Resolution attribute types (name, address, phone number, email, etc.).

Use Cases and Clean Rooms Integration

Key use cases include marketing customer unification (consolidating customer data from multiple channels to build a 360-degree customer view), fraud detection (identifying when different accounts belong to the same person), and data cleansing (detecting and merging duplicate records). Integration with Clean Rooms enables entity resolution across organizations without sharing raw data. For example, an advertiser and a publisher can identify common customers without exposing each other's customer data, enabling ad effectiveness measurement. Pricing is $0.25 per 1,000 records for ML matching and $0.025 per 1,000 records for rule-based matching. Processing 1 million customer records with ML matching costs approximately $250. To deepen your knowledge of data analytics, specialized books on Amazon can be helpful.

Entity Resolution Pricing

Entity Resolution pricing is based on the number of records processed. ML matching costs approximately $0.25 per 1,000 records, and rule-based matching costs approximately $0.25 per 1,000 records. ID mapping costs approximately $0.25 per 1,000 records. The initial entity resolution run processes all records, resulting in higher costs, but incremental processing of only new or updated records keeps ongoing costs low. When used with Clean Rooms integration, Clean Rooms query charges apply separately.

Summary - Guidelines for Using Entity Resolution

AWS Entity Resolution provides managed entity resolution for scattered customer data. Its key strengths are automatic handling of name variations through ML matching, precise control through rule-based matching, and privacy protection through Clean Rooms integration. It is well suited for organizations where customer data is distributed across multiple systems and building a unified customer view is a challenge.