Web Scraping Books

My kind of book store! Source: 'Megan Markham', unsplash.com

Intro

In this repo I plan to explore web scraping techniques in order to become more familiar with the coding libraries Beautiful Soup as well as Selenium. The website I plan to scrap was actually designed as a practice site and hopefully has some intentiionally beginner level concepts.

README Outline

Introduction
Readme Outline
Project Summary
Repo Contents
Libraries & Prerequisites
Conclusions
Future Work
Built With, Contributors, Authors, Acknowledgments

I can't imagine trying to find a book in here. Source: 'Janko Ferlic', unsplash.com

Project Summary

I found this project to be pretty challenging in the end. I spend a lot of time dealing with HTML tags and bs4.Element.tags which are pretty different than some of the other coding I have done. Though it certainly helped to be familiar with for loops, dictionaries, and pandas dataframes.

Repo Contents

This repo contains the following:

README.md - this is where you are now!
Web_Scraping_Books.ipynb - the Jupyter Notebook containing the finalized code for this project.
LICENSE - the required license information.
website url - "http://books.toscrape.com/index.html"
CONTRIBUTING.md
Images

Libraries & Prerequisites

These are the libraries that I used in this project.

numpy as np
pandas as pd
matplotlib.pyplot as plt
%matplotlib inline

Conclusions

I was able to scrap the site and pull together a list of books with titles, prices, and ratings.

Future Work

There is so much more I would like to do - and so many more websites to scrape!

This is what you get when you Google 'web-scraping'. Kinda nice really. Source: Vidar Nordli Mathisen, unsplash.com

Built With:

Jupyter Notebook Python 3.0 scikit.learn

Contributing

Please read CONTRIBUTING.md for details

Authors

Thomas Whipple

License

Please read LICENSE.md for details

Acknowledgments

Thanks to the website, "http://books.toscrape.com/index.html" and to Jeff Herman for helping me out.

twhipple/Web_Scraping_Books