TheUltimateC0der/listrr

Consider weighting the title more than the year when processing lists of names

Opened this issue · 1 comments

A recent list resulted in the following:

Green Room (2015)
matched Shelter 2015
should have been Green Room (2016)

The Conformist (1971)
missed
should have been The Conformist (1970)

Seven (1995)
matched Seven Landscapes 1995
should have been Se7en (1995)

Looper (2015)
matched Little Loopers 2015
should have been Looper (2012)

Shallow Grave (1995)
Matched Clueless (1995)
should have been Shallow Grave (1994)

The Place Beyond The Pines (2012)
missed
should have been The Place Beyond The Pines (2013)

In all those cases, it seems plainly apparent that the title is not a match. It seems like maybe searching on the title, then comparing years in the results would have a higher hit rate. For example, there is only one result for "The Place Beyond the Pines", and searching for most of the others by title turn up the correct title within a year of the requested one, which seems like a better guess than choosing "Clueless" in place of "Shallow Grave", for example.

Calculating the similarity between words is a pretty CPU intensive task. This is generally no problem. I will take a look on what I can do.