Entity Matching in NLP
In his talk the speaker will talk about Entity Matching. Entity matching is a problem of matching two entities and marking them as duplicates or similar. For example, matching entity 1 with 2 from table below and predicting that they are same.
ID Title Album Composer Song Writer
1 Me and Mrs. Jones Call Me Irresponsible Michael Buble
2 Me and Mrs. Jones [remix] Michael Buble
3 Blowin’ in the Wind The Freewheelin’ Bob Dylan. Bob Dylan
4 Blowing in the Wind Bob Dylan
In literature, lot of entity-matching techniques have been explored right from pattern based, ML based, Deep learning-based techniques. They try to solve the entity matching problem generally in two steps. First, create a small cluster or block where probable duplicates can be found. Second, within smaller blocks perform rigorous matching using NLP techniques. These two steps methods are effective to some extent. But when used on large number of real-world entities they either take lot of time and/or do not scale well.
In this talk the speaker explores domain specific, transformer-based entity matching.
Keywords – Transformers, BERT, Attention, Deep Learning, Seq2Seq models, Pytorch, TensorFlow
Avinash Pathak
Expert Data Scientist at TomTom
The speaker is an NLP enthusiast working as an Expert Data Scientist at TomTom.
He has nine years of experience in software industry. He has worked on projects of various scales. His current work involves deduplication using entity matching with the help of cutting-edge NLP techniques.