/WebScraper

Python web scraper, built with selenium and beautifulSoup4.

Primary LanguagePython

WebScraper

Python web scraper, built with selenium and beautifulSoup4. Automatically detects Chrome installation and downloads the correct driver.

To run with poetry:

  • Make sure you have poetry installed (version 1.1.1)
  • Make sure you have python 3.8.0 set as global/local version (can use pyenv)
  • Clone everything and run "poetry config virtualenvs.create false && poetry install" to create new repository from existing pyproject.toml file
  • Now run "poetry run python scraper.py" or "poetry shell && python scraper.py"
  • Script takes a few seconds and returns parsed data from target website.

To run as Docker image:

  • Clone everything and make sure you have Docker installed
  • Uncomment line 60 in scraper.py to enable using standalone Chrome browser
  • sudo docker run -d -p 4444:4444 selenium/standalone-chrome (This initializes chrome running in separate container)
  • sudo docker build --no-cache --network="host" -t . (Creates image from Dockerfile)

Repository also contains 2 tests in the test_scraper.py file. To test simply run "pytest" after enabling poetry environment (poetry shell && pytest).