KPI-2 and KPI-3 spellchecking
tomkralidis opened this issue · 0 comments
tomkralidis commented
Both KPI-2 and KPI-3 provide a point for passing a basic spellcheck.
pywcmp uses spellchecker to perform a basic spellchecker. In both KPI-2 and KPI-3 implementations, we send the title or abtract to pyspellchecker per below (spellcheck passes are empty sets).
>>> from spellchecker import SpellChecker
>>> s = SpellChecker()
>>> s.unknown('this is a test'.split())
set()
>>> s.unknown('this is a test (hi)'.split())
{'(hi)'}
>>> s.unknown('this is a test hi.'.split())
{'hi.'}
>>> s.unknown('this is a sentence. This is another sentence.'.split())
{'sentence.'}
As we can see, words in brackets, or normal sentences (end words with periods) result in false positives.
We need to update the spellchecking strategy and then apply by refacting both KPI-2 and KPI-3 spellchecking into a single function for reuse and consistency.