My kind of book store! Source: 'Megan Markham', unsplash.com
In this repo I plan to explore web scraping techniques in order to become more familiar with the coding libraries Beautiful Soup as well as Selenium. The website I plan to scrap was actually designed as a practice site and hopefully has some intentiionally beginner level concepts.
- Introduction
- Readme Outline
- Project Summary
- Repo Contents
- Libraries & Prerequisites
- Conclusions
- Future Work
- Built With, Contributors, Authors, Acknowledgments
I can't imagine trying to find a book in here. Source: 'Janko Ferlic', unsplash.com
I found this project to be pretty challenging in the end. I spend a lot of time dealing with HTML tags and bs4.Element.tags which are pretty different than some of the other coding I have done. Though it certainly helped to be familiar with for loops, dictionaries, and pandas dataframes.
This repo contains the following:
- README.md - this is where you are now!
- Web_Scraping_Books.ipynb - the Jupyter Notebook containing the finalized code for this project.
- LICENSE - the required license information.
- website url - "http://books.toscrape.com/index.html"
- CONTRIBUTING.md
- Images
These are the libraries that I used in this project.
- numpy as np
- pandas as pd
- matplotlib.pyplot as plt
- %matplotlib inline
I was able to scrap the site and pull together a list of books with titles, prices, and ratings.
There is so much more I would like to do - and so many more websites to scrape!
This is what you get when you Google 'web-scraping'. Kinda nice really. Source: Vidar Nordli Mathisen, unsplash.com
Jupyter Notebook Python 3.0 scikit.learn
Please read CONTRIBUTING.md for details
Thomas Whipple
Please read LICENSE.md for details
Thanks to the website, "http://books.toscrape.com/index.html" and to Jeff Herman for helping me out.