/proxy_web_crawler

This script automates the process of searching for a website via keywords and scraped proxy IP

Primary LanguagePythonMIT LicenseMIT

search for a website with a different proxy each time

This script automates the process of searching for a website via keyword and the Bing search engine.... page after page.

Pass a complete URL and at least 1 keyword as command line arguments:
python proxy_crawler.py -u https://www.example.com -k keyword

or if you would like to see the browser while crawling add the -d flag
python proxy_crawler.py -u https://www.example.com -k keyword -d

It first scrapes a list of proxies from the web using SSL Proxies.

Then using a new proxy socket for each iteration, the specified keyword is searched for via Bing until the desired website is found.

The website is then visited, and one random link is clicked within the website.

The bot is slowed down on purpose.

Along with Python 3 and geckodriver, the following are also required:

    
apt-get install xfvb
pip install pyvirtualdisplay
pip install selenium
pip install requests
    

I use this version of geckodriver on Ubuntu:

wget https://github.com/mozilla/geckodriver/releases/download/v0.23.0/geckodriver-v0.23.0-linux64.tar.gz

geckodriver should be saved somewhere in your PATH... ie: /usr/local/bin
This was developed on Ubuntu 16.04.4 LTS with selenium/geckodriver and firefox 60.0
Also tested on Ubuntu 18.04
Author: James Loye Colley 22MAY2018