opensanctions/yente

Improve matching API results

Closed this issue · 1 comments

pudo commented

Some users of yente have reported issues with the matching API. These described problems are:

  • Entities that are true matches score much too low, especially when the names are very short.
  • Entities that have the same DOB are ranked very highly, irrespective of other features (like names).
  • The way in which match is set to false when there are two matches is un-intuitive.
  • Regulators want phonetic matching.

Here's some of the steps we're going to explore:

  • Remove some sparse entities from the matcher training data
  • Index soundex forms for all names
  • Make name length less important in name match quality
  • Implement a specific "OFAC style matching mode"