Initially inspired by Between the Words, I decided to make a similar project. On 1/15/16 I downloaded Project Gutenberg's Top 100 EBooks yesterday and grabbed the text files for the top 10.
First, I stripped out the Gutenberg boilerplate and any tables of contents from the downloaded files. With some help from Readability Score and using Text-Statistics, calculated some statistics for each book; eg word count, grade level, sentence count.
I set this project aside for a few weeks and came across a Vox article based on Adam Calhoun's Medium post. Which reminded me I had set this aside in the first place.
I liked what Adam had done with graphs and "heatmaps" of punctuation, so I recreated that.
I wrote about this on Medium
- Better layout
- Adapt the script so that arbitrary text can be input and analyzed, as that seems to be what a lot of the Medium responses are asking for.
- clone this repo
$ npm install
$ gulp
- I probably forgot a step or two in the gulpfile, which I need to correct.