/GoodReads-Scraping-with-Scrapy

Goodreads Scraper with Scrapy and an analysis of the scraped data (Quotes related to Infosec, security, computer-science, hacking)

Primary LanguagePython

GoodReads Website 📖 Scraping with Scrapy

(and Analysis of the Result with SandDance to get the quotes for InfoSec, Hacking, Computer-Science and Security)

  • You can find Json, csv results of the scraping process in the repo, if you need.

I try to learn more about Data Science and do some projects to exercise for my hobbies. In this project, I scraped the GoodReads Website for quotes about InfoSec, Hacking, Computer-Science and Security ; then made a simple analysis of quotes related to these topics which I am interested in. Even this simple, non academic project can give you lots of insight and if you are interested, you can seek whether there is a correlation between the number of likes and the length of quotes, whether the author of that quote affect the result, etc.

  • The Python code will get the likes, tags, author, etc.Then, you let the Scrapy save the output as Json and csv formats. (scrapy crawl quotes -o quotes.csv)

Note : "quotes" is the I named the project. You need to give a unique name for Scrapy to work well, so according to your needs, if you use the code I used, then just arrange the code accordingly. Or let me know, I can also help if you encounter an issue.

  • You can examine the data with Pandas, etc in Jupyter. OR, there is a useful extension for Visual Studio Code from Microsoft Research called SandDance. Easy and effective to use. I added some screenshots and a short gif video of this analysis here. So you can have an idea.

image

image

image

image

image

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety." said Benjamin Franklin. What a lovely quote. It also seems a popular quote in GoodReads Website as the the number of likes might imply.

image

image image

image image

To sum up

  • I agree with 🤝 the Swedish author, Stieg Larsson “We need to have a talk on the subject of what's yours and what's mine.” with a conclusion that the data is ours! :). So feel free to use this hobby project. Just scraping but without getting insights from the scraped data has little value. So, the simple SandDance extension can give meaningful insights in a very easy way. That's why I've added screenshots, to inspire the ones who have not used this free extension/tool.

Open Source Love