Web scraping is simply automatically opening up any website and grabbing the data you find important on that website. It's fundamental to the internet, search engines, Data Science, automation, machine learning, and much more.
Opening websites and extracting data are only part of what makes web scraping great. It's the parsing of the data that's where the value is.
This project will cover:
- Basic web scraping with Python
- Web scraping with Selenium
- Sync vs Async
- Asynchronous Web scraping with Asyncio
Requirements:
- Python experience (at least the first 15 days of this project).
- Selenium & chromedriver installed (watch how in this one).
1. Clone
git clone https://github.com/codingforentrepreneurs/Supercharged-Web-Scraping-with-Asyncio supercharged
2. Create Virtual Environment
cd supercharged
python3.6 -m venv .
3. Activate virtual environment and install requirements Mac/Linux
source bin/activate
Windows:
.\Scripts\activate
If using pipenv, run
pipenv shell
&&pipenv install
Run jupyter
jupyter notebook
or
python -m jupyter notebook
If using pipenv, run
pipenv run jupyter notebook