/AnarchistCorpus

web scraping texts from the Anarchist Library for some natural language processing exploration

Primary LanguagePython

AnarchistCorpus

web scraping texts from the Anarchist Library for some natural language processing exploration

thanks to the Anarchist Library (theanarchistlibrary.org/) for compiling texts and having an incredible well organized site!

ideas:

  • tf-idf - what words are most significant in a given text?
  • LDA - what topic clusters can I find across documents?
  • recommendation engine based on tf-idf
  • NER - what entities appear most often? who or what is being talked about
  • sentiment analysis - look at 10 most + and - docs overall
  • all code to classes/methods?
  • parallelize code to run faster?