Unsupported YAKE languages
Artgit opened this issue · 1 comments
Artgit commented
Looks like the following languages are unsupported by YAKE:
TAGALOG
VIETNAMESE
BENGALI
BOKMAL
YORUBA
CZECH
SOTHO
URDU
PUNJABI
SWAHILI
ALBANIAN
BELARUSIAN
MACEDONIAN
AZERBAIJANI
AFRIKAANS
XHOSA
ICELANDIC
TAMIL
KAZAKH
MONGOLIAN
CATALAN
GEORGIAN
LATIN
MAORI
MALAY
NYNORSK
GUJARATI
TSWANA
BOSNIAN
ZULU
TELUGU
ESPERANTO
SERBIAN
SOMALI
TSONGA
GANDA
BASQUE
HEBREW
WELSH
THAI
IRISH
SHONA
KOREAN
MARATHI
It there any particular reason why they are unsupported?
arianpasquali commented
No particular reason. They were just not tested.
In theory the only language resource you need is a list of stopwords.
If you have access to a stopword list for that language you can just specify it using the stopwords
argument, in this case the language
argument is just ignored.
If you have access to annotated dataset for any of these languages and want to contribute, please take a look at the evaluation datasets repository that we maintain here https://github.com/LIAAD/KeywordExtractor-Datasets.