This repository is being used to keep track of my webscraper versions and progress
Below are the instuctions on how to install my websraper
Here is the tutorial I followed:
https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe
There are several folders in my repository:
'Notes'
- Class notes and other notes that I used in making the scraper; these are mostly for myself.
- My Flowchart is also kept in this folder (as well as down below)
- both the PNG and the XML file are kept here
'Versions'
- a folder holding all my previous versions. I did this because every time I push from local I backup everything.
'environment'
- This folder was used to host my virtual environment.
- I recommend you use this folder as well for your virtual environment.
- Located in this folder is also a file called 'WebScraper8.py'. This is the file that is my current version of my scraper. This is the file you should run to access my scraper.
Other
- Also included in the repository is a .gitignore file. This file ignores the .lpvenv file generated in the 'environment' file - making 'environment' a great place to host your virtual environemnt.
Here is my Flowchart: It outlines what my program does:
Install Instructions:
I am using Python 2.7.14
Before running, be sure you install BeautifulSoup 4 in your virtual environment:
pip install BeautifulSoup4
pip install twilio
You're all set! Happy Scraping!
Other Attributions
I also would like to link a peer's webscraper that I looked at to help form my own:
https://github.com/ErgoShrimp/webscrape
WebScraper created by:
Dexter Carpenter