Record Matching with AWS Entity Resolution - Customer Data Deduplication and Integration

Learn about record matching across multiple data sources with Entity Resolution and how to design matching workflows.

Entity Resolution Overview

Entity Resolution is a service that matches and links records distributed across multiple data sources to build a unified entity view, processing up to 20 million records per workflow. It automatically links records of the same customer scattered across CRM, e-commerce, and support systems. It provides two matching methods, rule-based and ML-based, handling name variations and address abbreviations.

Matching Methods

Rule-based matching uses explicit rules such as exact matches on email addresses or phone numbers. ML-based matching provides flexible matching that accounts for name variations, address abbreviations, and phone number format differences. A staged approach combining both methods optimizes the balance between accuracy and cost. An effective design processes high-confidence matches with rule-based matching first, then handles the remainder with ML.

Workflows and ID Mapping

Matching workflows take data sources (S3 or Glue tables) as input and output matching results to S3. Schema mapping maps input data columns to Entity Resolution standard fields (name, address, phone number, email address). ID mapping workflows integrate with third-party data providers (LiveRamp, TransUnion) to match your customer IDs against external ID graphs and generate unified IDs. Matching results include match IDs, confidence scores, and matched record pairs, which can be integrated into downstream analytics and marketing systems. To gain a deeper understanding of Entity Resolution analysis methods, specialized books (Amazon) can be helpful.

Entity Resolution Pricing

Entity Resolution pricing is based on the number of records processed for matching. Rule-based matching costs approximately 0.25 USD per 1,000 records, and ML-based matching costs approximately 0.75 USD. ID mapping incurs additional per-provider charges. While the initial matching processes all records, leveraging incremental matching (new and updated records only) reduces the cost of periodic runs. Performing data cleansing (normalizing notation, pre-eliminating obvious duplicates) before matching reduces the number of processed records and optimizes costs.

Summary

Entity Resolution is a service that matches and integrates records from multiple data sources to build a unified customer view. A staged approach that processes high-confidence matches with rule-based matching and handles name variations and address abbreviations with ML-based matching is effective. ID mapping enables integration with external data providers, and incremental matching optimizes the cost of periodic runs.