/extra-stopwords

Extra stopword lists for use with NLTK.

Primary LanguageShellMIT LicenseMIT

extra-stopwords

This repository contains the set of stopwords I used with NLTK for the WbSrch search engine.

It contains some stopword lists from NLTK and ones cobbled together from other sources. The word lists are of varying quality. Feel free to modify them to suit your own needs -- I make no claim about their level of usefulness.

In order to use them with NLTK, they should be copied into your stopwords directory after you've set up NLTK and downloaded its own stopword list. The copy.sh file will do that, assuming NLTK's data directory is in your home directory.

License

MIT license. See the "LICENSE" file for full text.

Contibutors

The following people have contributed to improving this library:

  • Jason Champion
  • Mohammed Gholami
  • Jan Pipek
  • Pavle Vidanović