/goodreads_categories_scrapping

First web scraping experiment (shell and Python)

Primary LanguagePython

README

Wanted to investigate Goodreads' categories numbers and play a little bit with Python's html parsing libraries (Beautiful soup in this case)

To download book categories html from Goodreads:

./download_script

Then to retrieve data and popuate a CSV with these data:

./assemble_csv

or to do both:

./download_script && ./assemble_csv

Folders

examples: In the examples folder diagrams with most and least popular categories (after placing generated CSV to Google Doc's spreadsheet.

list_html: Downloaded files. Commited folder's content in case anyone wants to experiment without retrieving data.

Notes

Did not explore Goodreads API as was more interested in experimenting with web scraping.