charlottemcclintock/AnarchistCorpus

web scraping texts from the Anarchist Library for some natural language processing exploration

Python

AnarchistCorpus

web scraping texts from the Anarchist Library for some natural language processing exploration

thanks to the Anarchist Library (theanarchistlibrary.org/) for compiling texts and having an incredible well organized site!

ideas:

tf-idf - what words are most significant in a given text?
LDA - what topic clusters can I find across documents?
recommendation engine based on tf-idf
NER - what entities appear most often? who or what is being talked about
sentiment analysis - look at 10 most + and - docs overall
all code to classes/methods?
parallelize code to run faster?