phatpiglet/autocorrect

Time out for function

Opened this issue · 3 comments

Given a list of words, I'm looping trough them to correct any misspelled words. For now I'm stuck in the 5738 word for more than five minutes, with memory usage up to 12GB of RAM and disk usage of 120MB/s.
It would be nice to have a time out parameter to abort if the search is taking to long. It probably possible to optimize the memory usage either.

I was able to go around the time issue with this package.

CODE SAMPLE:

from tqdm import tqdm
import stopit

@stopit.threading_timeoutable(default='aborted')
def speel_corrector(word):
    return spell(word)

aborted = []
to_replace = {}
for word in tqdm(results):
    new_word = speel_corrector(word, timeout=20)
    
    if new_word == 'aborted':
        aborted.append(word)
    elif new_word != word:
        to_replace[word] = new_word

@paulaceccon Thanks for reporting this issue and posting your resolution!

I also encountered this. For example, this spell() invocation took 10 seconds to resolve on my machine:
spell('................................................................')

One workaround is to only run spell() on strings composed of alphanumerics:

import re

if re.match("^[\w]+$", word):
  return spell(word)

This seems to be related to #18