To calculate similarity score of Football Team name, we will use couple of strategies:
-
We clean both input strings
-
We assume that order of words in alternative name is the same as in original. For example alternative to West Bromwich will not be Bromwich West
-
Next we expand alternative string by capital letter as if it was separate word For example: ABeC becomes A Be C
-
We iterate over original words and checking how much of alternative word is in its original counterpart B - Bromwich, that is 1/8 If we have partial match, we can assume it is abbreviation of the word We fit the weight to the range of 1.0>=weight>=0.5 If there is no match then we put 0 weight
-
We calculate the similarity as sum of weights divided by number of match attempts performed
-
If resulting similarity is >=0.5 we can assume we have a likely match If similarity is 1.0 then it is a perfect match If similarity is <0.5 then match is unlikely If similarity is 0.0 then for sure there is no match
I assume you have Go installed (tested with 1.7 version). Run:
go get github.com/elcct/name-matcher
Name matcher should be in your $GOPATH/bin. You can try it by issuing:
./name-matcher -original="West Bromwich Albion Football Club" -alternative="WBA FC"
Should give you:
0.5916666666666667
Which is a likely match.