arindammitra1/webscraping_goodreads

Web Scraping with Pandas and BeautifulSoup and Requests

Jupyter Notebook

Webscraping Top 500 book details from goodreads using Python

With BeautifulSoup and Requests

Goodreads.com is a comprehensive list of top-rated books, as voted on by the general Goodreads community.

We will use Python, BeautifulSoup and Requests to scrape first 5 pages and create list of top 500 books and some interesting information on them.

Outline of the Project:

Part A:

Exploration and scrapping information from 1 page
Download a single page from goodread.com and store it
Scrape the stored page, and extract the required data from the page with BeautifulSoup

Part B: Put things together - a Scalable code for any number of pages

Create a dictionary to store the book information
Write separate functions to scrape a particular information from the BeautifulSoup document, and add it to the dictionary
Repeat this for any number of pages by appending new items to the dictionary
Store this in a Panda detaframe
Save dataframe to a csv file