usgo/agagd

Improve search based on player name

Closed this issue · 2 comments

Is your feature request related to a problem?

The current player name search seems to be based on a very naive substring matching, which causes a number of very simple and straightforward search queries to fail.

I'll use the top ranked player's name (Albert Yen) as an example, also because it showcases additional issues which arise when someone has very common first and last names:

  • "yen, albert" finds the correct match.
  • "yen, albert" (note the two spaces after the comma) finds no matches.
  • " yen, albert" (note the space before the surname) finds no matches.
  • "yen, albert " (note the space after the name) finds no matches.
  • "yen albert" (note no comma) finds no matches.
  • "albert yen" finds no matches.
  • "albert" finds many matches, which need to be parsed to find the right one.
  • "yen" finds many matches, which need to be parsed to find the right one.

Especially when someone has very common name or surname, it seems like the only way to find their profile quickly is to match the exact format "surname, name" with the correct number of spaces and commas located at the right places. Further, the search query needs to be at least sanitized to remove unnecessary spaces or commas from the query.

Describe the feature you'd like to see on the AGAGD.

I don't know much about best practices when it comes to finding fuzzy substring matches, but off the top of my head the following would be a simple improvement which would address the biggest of the above issues:

  • take the search query, sanitize it and split it into tokens of alphanumeric characters (ignore spaces and commas, but allow dashes and apostrophes since those do appear in names)
  • for each token, run a search based on simple substring matching (or better yet, based on a version of substring matching which ignores accents and matches "o" with "ò" for example)
  • join the results obtained by each token, ignoring duplicates

Just realized this is related to #164, might warrant closing this.

fixed by #249