/Books_to_Scrape

Primary LanguageJupyter Notebook

Books_to_Scrape

[Scrapping Techniques using Regular Expressions (Regex)]

The learning objective of this exercise is to practice Python programming language using Beautiful Soup library to "scrape" books information from the website https://books.toscrape.com

Deliverable result:

Must be a '.csv' file; table with 6 columns named as title, price, rate, availability in stock, category, and date/time of scrapping.

Process:

  • create a pattern dataframe containing the 6 columns listed above
  • for each category (Classics, Science Fiction, Humor, and Business), create a Beautiful Soup object to extract the information in their specific webpages and create a dataframe to concatenate with the pattern dataframe. Missing information will be filled with 'NA'.
  • concatenate all dataframes into a final dataframe
  • export the final dataframe to a '.csv' file

Input:

Webpages for each category as follow:

Output:

Snip of the .csv file: