web scraping texts from the Anarchist Library for some natural language processing exploration
thanks to the Anarchist Library (theanarchistlibrary.org/) for compiling texts and having an incredible well organized site!
- tf-idf - what words are most significant in a given text?
- LDA - what topic clusters can I find across documents?
- recommendation engine based on tf-idf
- NER - what entities appear most often? who or what is being talked about
- sentiment analysis - look at 10 most + and - docs overall
- all code to classes/methods?
- parallelize code to run faster?