SSl Proxy Support
Chetan11-dev opened this issue · 0 comments
Hi, I have created a package named botasaurus-proxy-authentication
, which enables SSL support for proxies requiring authentication.
For instance, when using an authenticated proxy with a tool like seleniumwire to scrape a Cloudflare-protected website such as G2.com, a non-SSL connection typically results in being blocked.
To illustrate, run this code:
First, install the required packages:
python -m pip install selenium_wire chromedriver_autoinstaller
Then, execute this Python script:
from seleniumwire import webdriver
from chromedriver_autoinstaller import install
# Define the proxy
proxy_options = {
'proxy': {
'http': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy
'https': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy
}
}
# Install and set up the driver
driver_path = install()
driver = webdriver.Chrome(driver_path, seleniumwire_options=proxy_options)
# Navigate to the desired URL
link = 'https://www.g2.com/products/github/reviews'
driver.get("https://www.google.com/")
driver.execute_script(f'window.location.href = "{link}"')
# Wait for user input
input("Press Enter to exit...")
# Clean up
driver.quit()
You'll likely be blocked by Cloudflare:
First, install the required packages:
python -m pip install botasaurus-proxy-authentication
However, using botasaurus_proxy_authentication
with proxies circumvents this problem. Notice the difference by running the following code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from chromedriver_autoinstaller import install
from botasaurus_proxy_authentication import add_proxy_options
# Define the proxy settings
proxy = 'http://username:password@proxy-provider-domain:port' # Replace with your proxy
# Set Chrome options
chrome_options = Options()
add_proxy_options(chrome_options, proxy)
# Install and set up the driver
driver_path = install()
driver = webdriver.Chrome(driver_path, options=chrome_options)
# Navigate to the desired URL
link = 'https://www.g2.com/products/github/reviews'
driver.get("https://www.google.com/")
driver.execute_script(f'window.location.href = "{link}"')
# Wait for user input
input("Press Enter to exit...")
# Clean up
driver.quit()
I suggest using botasaurus_proxy_authentication
for its SSL support for authenticated proxies, improving the success rate of scraping Cloudflare-protected websites and thus increasing revenue for Oxylabs.
Also, Thanks Oxylabs for your Great Work in Proxy.
Good Luck to the Team.