The objective of this project is to create a web scraping application that can be used for the most common use cases
Geckodriver
- Cloud deployment (heroku)
- Django web scraping app
- Unit tests
- Python package
- write to csv files
- CI/CD
- Rest API (flask, SQL or MongoDB)
-u url (pass one url into the command line) -l list of urls (path to text file of urls) -f findAll (pass arguments like h1 or p or similar) -s selector (to be done later) -r regex (to be done later) -p preset + preset name (phonenum, email, ) -w write to file (give file path and file name) -t telephone number
pyhton3 webscrape.py -u site.com -f "h1"
or
python3 webscrape.py -l /usr/share/listofsites -f "p"