concurrent-web-scraping

Concurrent Web Scraping with Python and Selenium

Want to use this project?

Fork/Clone
Create and activate a virtual environment
Install the requirements

Run the scrapers:

# sync
(env)$ python script.py headless

# parallel with multiprocessing
(env)$ python script_parallel_1.py headless

# parallel with concurrent.futures
(env)$ python script_parallel_2.py headless

# concurrent with concurrent.futures (should be the fastest!)
(env)$ python script_concurrent.py headless

# parallel with concurrent.futures and concurrent with asyncio
(env)$ python script_asyncio.py headless

Run the tests:

(env)$ python -m pytest test/test_scraper.py
(env)$ python -m pytest test/test_scraper_mock.py

wakuseo/concurrent-web-scraping

concurrent-web-scraping

Concurrent Web Scraping with Python and Selenium

Want to use this project?