Improve matching API results
pudo opened this issue · 1 comments
pudo commented
Some users of yente
have reported issues with the matching API. These described problems are:
- Entities that are true matches score much too low, especially when the names are very short.
- Entities that have the same DOB are ranked very highly, irrespective of other features (like names).
- The way in which
match
is set to false when there are two matches is un-intuitive. - Regulators want phonetic matching.
Here's some of the steps we're going to explore:
- Remove some sparse entities from the matcher training data
- Index soundex forms for all names
- Make name length less important in name match quality
- Implement a specific "OFAC style matching mode"
pudo commented