oxylabs/selenium-proxy-integration-python

SSl Proxy Support

Chetan11-dev opened this issue · 0 comments

Hi, I have created a package named botasaurus-proxy-authentication, which enables SSL support for proxies requiring authentication.

For instance, when using an authenticated proxy with a tool like seleniumwire to scrape a Cloudflare-protected website such as G2.com, a non-SSL connection typically results in being blocked.

To illustrate, run this code:

First, install the required packages:

python -m pip install selenium_wire chromedriver_autoinstaller

Then, execute this Python script:

from seleniumwire import webdriver
from chromedriver_autoinstaller import install

# Define the proxy
proxy_options = {
    'proxy': {
        'http': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy
        'https': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy
    }
}

# Install and set up the driver
driver_path = install()
driver = webdriver.Chrome(driver_path, seleniumwire_options=proxy_options)

# Navigate to the desired URL
link = 'https://www.g2.com/products/github/reviews'
driver.get("https://www.google.com/")
driver.execute_script(f'window.location.href = "{link}"')

# Wait for user input
input("Press Enter to exit...")

# Clean up
driver.quit()

You'll likely be blocked by Cloudflare:

blocked

First, install the required packages:

python -m pip install botasaurus-proxy-authentication

However, using botasaurus_proxy_authentication with proxies circumvents this problem. Notice the difference by running the following code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from chromedriver_autoinstaller import install
from botasaurus_proxy_authentication import add_proxy_options

# Define the proxy settings
proxy = 'http://username:password@proxy-provider-domain:port'  # Replace with your proxy

# Set Chrome options
chrome_options = Options()
add_proxy_options(chrome_options, proxy)

# Install and set up the driver
driver_path = install()
driver = webdriver.Chrome(driver_path, options=chrome_options)

# Navigate to the desired URL
link = 'https://www.g2.com/products/github/reviews'
driver.get("https://www.google.com/")
driver.execute_script(f'window.location.href = "{link}"')

# Wait for user input
input("Press Enter to exit...")

# Clean up
driver.quit()

Result:
not blocked

I suggest using botasaurus_proxy_authentication for its SSL support for authenticated proxies, improving the success rate of scraping Cloudflare-protected websites and thus increasing revenue for Oxylabs.
Also, Thanks Oxylabs for your Great Work in Proxy.
Good Luck to the Team.