/Webscraping

Selenium, python, proxies.

Primary LanguageJupyter Notebook

Webscraping

Selenium, python, proxies.

::Python Mozilla Firefox start with checked proxy from list::

<Firefox does not offer a way to programatically start the browser with a specified proxy in windows.

By writing to its stored profile file directly, the proxy settings can be specified.

*This is already possible in selenium with firefox and firefox can be launched with easily specified proxy settings, this however will use the vanilla firefox browser

which cannot be called with selenium. >

::crawl3 git::

<Reading from a text file containing large unstructured text which contains website links. Website links are identified, then called, then scraped for contact email addresses.

For example a large consumer report on a particular industry detailing a great number of firms with associated websites can be transformed into a list of contact emails.

Known problems: multiple email addresses and anti scraping/obfuscation.>