/Scrape_n_bake

An open source web scraper written in Python with selenium webdriver and beautiful soup

Primary LanguagePythonMIT LicenseMIT

Scrape n Bake

PyPI - Status GitHub repo size no python2 MIT license platform

Objective

The objective of this project is to create a web scraping application that can be used for the most common use cases

prerequisites

python version

BeautifulSoup Beautiful Soup

Selenium Selenium Webdriver

Geckodriver

Future updates

  • Cloud deployment (heroku)
  • Django web scraping app
  • Unit tests
  • Python package
  • write to csv files
  • CI/CD
  • Rest API (flask, SQL or MongoDB)

arguments

-u url (pass one url into the command line) -l list of urls (path to text file of urls) -f findAll (pass arguments like h1 or p or similar) -s selector (to be done later) -r regex (to be done later) -p preset + preset name (phonenum, email, ) -w write to file (give file path and file name) -t telephone number

examples

pyhton3 webscrape.py -u site.com -f "h1"

or

python3 webscrape.py -l /usr/share/listofsites -f "p"