wmo-im/pywcmp

KPI-2 and KPI-3 spellchecking

tomkralidis opened this issue · 0 comments

Both KPI-2 and KPI-3 provide a point for passing a basic spellcheck.

pywcmp uses spellchecker to perform a basic spellchecker. In both KPI-2 and KPI-3 implementations, we send the title or abtract to pyspellchecker per below (spellcheck passes are empty sets).

>>> from spellchecker import SpellChecker
>>> s = SpellChecker()
>>> s.unknown('this is a test'.split())
set()
>>> s.unknown('this is a test (hi)'.split())
{'(hi)'}
>>> s.unknown('this is a test hi.'.split())
{'hi.'}
>>> s.unknown('this is a sentence. This is another sentence.'.split())
{'sentence.'}

As we can see, words in brackets, or normal sentences (end words with periods) result in false positives.

We need to update the spellchecking strategy and then apply by refacting both KPI-2 and KPI-3 spellchecking into a single function for reuse and consistency.