Lowercase of all languages needed in utils.py

Question

Lowercase of all languages needed in utils.py

Closed this issue a year ago · 1 comments

In the utils.py, I needed to change to language.lower()

def normalize_language(language):
    for lookup_key in ("alpha_2", "alpha_3"):
        try:
            lang = languages.get(**{lookup_key: language})
            
            if lang:
                language = lang.name.lower()
        except KeyError:
            pass

    return language.lower()

so as to avoid cryptic errors when the language name was capitalized:

sumy text-rank --format=html --language=Polish
sumy text-rank --format=html --language=French

etc.

->

>  LookupError: NLTK tokenizers are missing or the language is not supported.
> Download them by following command: python -c "import nltk; nltk.download('punkt')"
> Original error was:
> 
> **********************************************************************
>   Resource punkt not found.
>   Please use the NLTK Downloader to obtain the resource:
> 
>   >>> import nltk
>   >>> nltk.download('punkt')
>   
>   For more information see: https://www.nltk.org/data.html
> 
>   Attempted to load tokenizers/punkt/PY3/Polish.pickle
>

Otherwise the package is great.

Answer 1 · 2024-02-11T14:03:13.000Z

Thank you, should be fixed in main now.