Can fuzzywuzzy be used in this case?
jonathanmv opened this issue · 2 comments
I have the description of a YouTube video and I want to find if a specific word appears in the text, including typos. For example take the following description
If you tell me you're super busy, I'm going to ask to see your written plan.\n\nMy book "10 Steps to Earning Awesome Grades" is now out and it's free! Get it here:\n\nhttp://collegeinfogeek.com/get-better-grades/\n\nIf you want to get even more strategies and tips on becoming a more productive, successful student, subscribe to my channel right here:\n\nhttp://buff.ly/1vQP5ar\n\nConnect with me on Twitter!\n\nhttps://twitter.com/TomFrankly\n\nCompanion blog post with notes and resource links: \n\nhttp://collegeinfogeek.com/massive-workloads/
I would like to know if the word twitter
is present in the description. I would then do
FuzzySearch.extractOne(videoDescription, Arrays.asList("Twitter"))
// (string: Twitter, score: 57, index: 0)
And if the text has typos the score decreases as expected.
Is this a good use for the library?
I wouldn't recommend matching with such long strings. Try splitting your description into either words or sentences and then try and find a word/sentence that matches with a score of say ~90 (or higher/lower depending on your acceptance of false positives).
Perfect. Thank you @xdrop for the suggestion.