fmhall/relevant-post-bot

Improve accuracy by adding a step when comparing titles

mattbruv opened this issue · 1 comments

I noticed that this thread which should be an easy 100% match, was a 75% match:

https://i.imgur.com/SYNE3ou.png

The problem seems to be that even though the words are the same, the uppercase/lowercase of the letters throws it off. Perhaps making the titles that are being compared all lowercase/uppercase before comparing them would fix this problem?

Adding to this, normalizing the apostrophe characters could also be good; I've encountered a case where the original title had apostrophes, and the parody had '.

Perhaps also introduce a mechanism to rank similarly based on individual character differences like

  • case variance (uppercase vs lowercase variants of the same word)
  • apostrophe existence (dont vs don't)