6/stopwords-json

Coverage for African languages

dohliam opened this issue · 3 comments

Great project! I noticed there are currently no African languages included, so I've started the more-stoplists project to rectify that.

We are in the process of creating approximately 50-60 lists of stopwords from the ASP corpus. Swahili and Afrikaans are already complete, and the rest will follow gradually as we go through the process of manually checking each of the automatically generated lists.

Would it be okay if I submitted a PR with the extra languages? In commit 900b3fa I have added af.json and sw.json, and updated the stopwords-all.json and README.md files. I could start with these and submit the others as they are completed. Let me know what you think!

6 commented

Looks great! Yes, please feel free to submit a PR with additional languages, I would be happy to review.

Great to see that this project is still active! I've submitted a PR for 7 new languages from the corpus. Some background on the corpus and the methodology can be found in this repo, where we are working on creating more lists of stopwords. I can submit further pull requests as the new languages are completed.

6 commented