Scalable Entity Resolution With Python and ML
Real-world data is far from perfect. It often contains multiple records belonging to the same entity (e.g., customer, property, etc.). These records can come from multiple systems and have variations across different attributes. This makes it hard to combine them together, especially with growing data volumes. Unfortunately, unharmonized data is not fit for use in customer analytics, risk and compliance and data engineers and scientists end up building some sort of rule or heuristic based system to manage it.
This talk will cover Entity Resolution, which is also refered to as identity resolution, record linkage, deduplication or fuzzy matching. Entity Resolution helps to link and unify records that refer to the same real-world entity like customer or supplier.
This talk will cover the needs and challenges of entity resolution, and introduce open source python package Zingg(https://github.com/zinggAI/zingg) which can be used to resolve entities at scale.
We will discuss Zingg algorithms and Python API usage.
Sonal Goyal
Founder at Zingg.ai