WebScraper

By Dexter Carpenter

This repository is being used to keep track of my webscraper versions and progress

Below are the instuctions on how to install my websraper

Here is the tutorial I followed:

https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe


There are several folders in my repository:

'Notes'

  • Class notes and other notes that I used in making the scraper; these are mostly for myself.
  • My Flowchart is also kept in this folder (as well as down below)
    • both the PNG and the XML file are kept here

'Versions'

  • a folder holding all my previous versions. I did this because every time I push from local I backup everything.

'environment'

  • This folder was used to host my virtual environment.
  • I recommend you use this folder as well for your virtual environment.
  • Located in this folder is also a file called 'WebScraper8.py'. This is the file that is my current version of my scraper. This is the file you should run to access my scraper.

Other

  • Also included in the repository is a .gitignore file. This file ignores the .lpvenv file generated in the 'environment' file - making 'environment' a great place to host your virtual environemnt.

Here is my Flowchart: It outlines what my program does:

Flowchart


Install Instructions:

I am using Python 2.7.14

Before running, be sure you install BeautifulSoup 4 in your virtual environment:

pip install BeautifulSoup4
pip install twilio

You're all set! Happy Scraping!


Other Attributions

I also would like to link a peer's webscraper that I looked at to help form my own:

https://github.com/ErgoShrimp/webscrape


WebScraper created by:

Dexter Carpenter