GoodReads Website 📖 Scraping with Scrapy
(and Analysis of the Result with SandDance to get the quotes for InfoSec, Hacking, Computer-Science and Security)
- You can find Json, csv results of the scraping process in the repo, if you need.
I try to learn more about Data Science and do some projects to exercise for my hobbies. In this project, I scraped the GoodReads Website for quotes about InfoSec, Hacking, Computer-Science and Security ; then made a simple analysis of quotes related to these topics which I am interested in. Even this simple, non academic project can give you lots of insight and if you are interested, you can seek whether there is a correlation between the number of likes and the length of quotes, whether the author of that quote affect the result, etc.
- The Python code will get the likes, tags, author, etc.Then, you let the Scrapy save the output as Json and csv formats. (scrapy crawl quotes -o quotes.csv)
Note : "quotes" is the I named the project. You need to give a unique name for Scrapy to work well, so according to your needs, if you use the code I used, then just arrange the code accordingly. Or let me know, I can also help if you encounter an issue.
- You can examine the data with Pandas, etc in Jupyter. OR, there is a useful extension for Visual Studio Code from Microsoft Research called SandDance. Easy and effective to use. I added some screenshots and a short gif video of this analysis here. So you can have an idea.
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety." said Benjamin Franklin. What a lovely quote. It also seems a popular quote in GoodReads Website as the the number of likes might imply.
To sum up
- I agree with 🤝 the Swedish author, Stieg Larsson “We need to have a talk on the subject of what's yours and what's mine.” with a conclusion that the data is ours! :). So feel free to use this hobby project. Just scraping but without getting insights from the scraped data has little value. So, the simple SandDance extension can give meaningful insights in a very easy way. That's why I've added screenshots, to inspire the ones who have not used this free extension/tool.