Breaking change in nltk 3.8.2
yifanmai opened this issue · 1 comments
yifanmai commented
Upstream issue: nltk/nltk#3293
We can work around by pinning the version to 3.8.1.
This causes errors in the tests such as:
E LookupError:
E **********************************************************************
E Resource punkt_tab not found.
E Please use the NLTK Downloader to obtain the resource:
E
E >>> import nltk
E >>> nltk.download('punkt_tab')
E
E For more information see: https://www.nltk.org/data.html
E
E Attempted to load tokenizers/punkt_tab/english/
E
E Searched in:
E - '/home/runner/nltk_data'
E - '/opt/hostedtoolcache/Python/3.9.19/x64/nltk_data'
E - '/opt/hostedtoolcache/Python/3.9.19/x64/share/nltk_data'
E - '/opt/hostedtoolcache/Python/3.9.19/x64/lib/nltk_data'
E - '/usr/share/nltk_data'
E - '/usr/local/share/nltk_data'
E - '/usr/lib/nltk_data'
E - '/usr/local/lib/nltk_data'
E - 'benchmark_output/perturbations/synonym'
E **********************************************************************
Example stack trace:
src/helm/benchmark/metrics/test_bias_metrics.py:16: in check_test_cases
bias_score = bias_func(test_case.texts)
src/helm/benchmark/metrics/bias_metrics.py:157: in evaluate_stereotypical_associations
tokens = word_tokenize(text.lower())
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nltk/tokenize/__init__.py:129: in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nltk/tokenize/__init__.py:106: in sent_tokenize
tokenizer = PunktTokenizer(language)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nltk/tokenize/punkt.py:1744: in __init__
self.load_lang(lang)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nltk/tokenize/punkt.py:1749: in load_lang
lang_dir = find(f"tokenizers/punkt_tab/{lang}