Improve accuracy by adding a step when comparing titles
mattbruv opened this issue · 1 comments
mattbruv commented
I noticed that this thread which should be an easy 100% match, was a 75% match:
https://i.imgur.com/SYNE3ou.png
The problem seems to be that even though the words are the same, the uppercase/lowercase of the letters throws it off. Perhaps making the titles that are being compared all lowercase/uppercase before comparing them would fix this problem?
Rekkonnect commented
Adding to this, normalizing the apostrophe characters could also be good; I've encountered a case where the original title had ’
apostrophes, and the parody had '
.
Perhaps also introduce a mechanism to rank similarly based on individual character differences like
- case variance (uppercase vs lowercase variants of the same word)
- apostrophe existence (dont vs don't)