Kubernetes DNS and Remote Webdriver
prabhatverma286 opened this issue ยท 13 comments
Thank you for such an amazing tool you have built!
If I understand the functionality correctly, selenium-wire opens up a proxy socket on the addr
FQDN (and random port), and any requests from selenium are routed through this proxy. This allows the proxy to capture and process all the requests.
It took me a while to understand that the addr
FQDN is used for both - the creation of the proxy socket and from selenium to connect to the proxy.
I have set up a remote selenium grid on my kubernetes cluster, and I am trying to connect to it from another pod within my kubernetes cluster. The services for these pods are of type ClusterIP - meaning that the IP is randomly generated with each deployment. Kubernetes has intelligent DNS resolution where you can specify http://service-name:port, and it will resolve it to the IP address. So I should be able to open a port with service-name
in the addr
option, however when I try to do that, I get the following error:
seleniumwire.thirdparty.mitmproxy.exceptions.ServerException: Error starting mitmproxy server: gaierror(-5, 'No address associated with hostname')
I tried using 127.0.0.1
or 0.0.0.0
, and although I am able to create the proxy server, selenium is of course unable to connect to it and it fails with
Message: unknown error: net::ERR_PROXY_CONNECTION_FAILED
It would be beneficial if I could define, for example, an addr
to start the proxy server (where I could use 127.0.0.1) and another option to be able to reach that proxy server from selenium (where I can leverage the kubernetes DNS resolution).
Thanks for raising this. You're right, Selenium Wire sends all browser traffic through an internal proxy it spins up in the background, and it uses the same address for both the proxy server and for Selenium itself when it configures the browser. There isn't the ability to separate these currently in Selenium Wire itself, but I think there may be a workaround.
If you're using the latest version of Selenium Wire, then there's an option called auto_config
. This tells Selenium Wire to configure the browser - via Selenium - with the IP/port of it's internal proxy. The option is set to True
by default but if you set this to False
then Selenium Wire won't configure the browser and will assume you will do it manually. You can configure the browser by passing a browser specific option (I'm assuming Chrome here) and specify the Kubernetes service-name at this point.
Here's some code that demonstrates how to do it:
from seleniumwire import webdriver
sw_options = {
'auto_config': False, # Ensure this is set to False
'addr': '0.0.0.0', # The address the proxy will listen on
'port': 8087,
}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=service-name:8087') # Specify your Kubernetes service-name here
chrome_options.add_argument('--ignore-certificate-errors')
driver = webdriver.Remote(
desired_capabilities=chrome_options.to_capabilities(),
seleniumwire_options=sw_options,
)
driver.get(...)
Would that allow you to take advantage of the Kubernetes DNS resolution?
@prabhatverma286 did the suggested workaround above work for you?
@wkeeling apologies for taking so long to reply - got a bit busy. I tried the workaround again today and it works, so thank you so much!
FWIW, my code below (I am using forward proxy as I am behind a firewall)
from selenium.webdriver.common.by import By
from seleniumwire import webdriver
options = {
'suppress_connection_errors': False,
'auto_config': False,
'addr': '0.0.0.0',
'port': 8087,
'proxy': {
'http': <forward proxy details like scheme://user:pass@ip:port>,
'https': <forward proxy details like scheme://user:pass@ip:port>,,
},
}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=kubernetes-service-name:8087')
chrome_options.add_argument('--ignore-certificate-errors')
browser = webdriver.Remote('http://selenium-service-name:4444/wd/hub',
desired_capabilities=chrome_options.to_capabilities(), seleniumwire_options=options)
print("Browser setup done.")
# Use try/finally so the browser quits even if there is an exception
try:
print("Getting yt.")
browser.get("https://www.youtube.com/")
print("Saving screenshot for yt")
browser.save_screenshot('yt.png')
print("Extracting Xpath.")
text = browser.find_element(By.XPATH,'/html/body/ytd-app/div/ytd-page-manager/ytd-browse/ytd-two-column-browse'
'-results-renderer/div[1]/ytd-rich-grid-renderer/div['
'6]/ytd-rich-item-renderer[1]/div/ytd-rich-grid-media/div[1]/div/div['
'1]/h3/a/yt-formatted-string').text
print(f'The title of the first video on youtube is : {text}')
except Exception as e:
print(e)
finally:
browser.quit()
print(browser.requests)
Which produces the output:
python selenium_test.py
Browser setup done.
Getting yt.
Saving screenshot for yt
Extracting Xpath.
The title of the first video on youtube is : Positive Mood JAZZ - Sunny Jazz Cafe and Bossa Nova Music
[]
Took me a couple of tries though, because I remember trying it a few weeks ago and it didn't work. But most probably I was doing something wrong.
One question though: the requests dictionary at the end of the output is empty. Any idea why?
Great news that the workaround works! I may now look at adding the above example to the readme, as it may provide useful for other people running a container setup.
Regarding printing the requests, you just need to make sure that you print them before calling browser.quit()
. Quitting the browser will shutdown Selenium Wire and clear out all captured requests. So switching the statements around should fix:
finally:
print(browser.requests)
browser.quit() # Clears out request storage
@prabhatverma286
I'm trying use webdriver.Remote and user pass proxy with selenium wire in docker, still confused about some params in your code
options = {
...
'addr': '0.0.0.0',
'port': 8087,
'proxy': {
'http': http://user:pass@ip:port,
'https': https://user:pass@ip:port>,
},
}
what is the port at here?
And
chrome_options.add_argument('--proxy-server=kubernetes-service-name:8087')
what is kubernetes-service-name at here? if I'm using docker at here, what should I put at here?
my docker-compose.yml is like
...
chrome:
image: selenium/standalone-chrome:latest
hostname: chrome
ports:
- "4444:4444"
privileged: true
shm_size: 2g
and current code is like
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
self.browser = webdriver.Remote(command_executor='http://chrome:4444/wd/hub', options=chrome_options, desired_capabilities=capabilities)
Appreciate your help in advance
What is selenium-service-name here
This is my working setup with proxy and chrome-node in Docker.
Docker containers:
selenium/node-chrome:115.0
selenium/hub:4.11
from seleniumwire import webdriver
PROXY_IP = 'your proxy ip'
PROXY_PORT = 'your proxy port'
sw_options = {
'addr': "0.0.0.0",
'auto_config': False,
'port': 8087, # You can choose any other, the main thing is that it is free
# proxy settings
'proxy': {
'http': f'http://{PROXY_IP}:{PROXY_PORT}',
'https': f'https://{PROXY_IP}:{PROXY_PORT}'
}
}
chrome_options = webdriver.ChromeOptions()
# here should be port from sw_options['port']
chrome_options.add_argument('--proxy-server=host.docker.internal:8087')
chrome_options.add_argument('--ignore-certificate-errors')
driver = webdriver.Remote(
command_executor="http://localhost:4444/wd/hub", # docker selenium-hub address
options=chrome_options,
seleniumwire_options=sw_options
)
Hi, thanks @kaletvintsev for sharing.
Let's say I'm using jupyterlab in a docker, on a machine A. Then from a notebook, I want to call the hub on server B, which decides which node C to call on its own... So there are 3 machines, + docker proxys on each machine.
I am a bit confused about what addr, port, PROXY_IP and PROXY_PORT are supposed to target.
Because:
- A has ip: 192.168.0.a
- JupyterLab in docker container on A has ip: 172.17.0.xx
- B has ip: 192.168.0.b
- Hub in docker container on B has ip: 172.18.0..xx
- C has ip: (well, it depends on what Hub decides, but you get where I'm going with this)...
Just to be has clear as possible:
- web UI for jupyterlab is available at http://192.168.0.a:8888
- web UI for hub is available at http://192.168.0.b:4444
- and my node config in docker-compose is:
- "SE_EVENT_BUS_SUBSCRIBE_PORT=4443"
- SE_NODE_HOST=${SE_NODE_IP}
- "SE_NODE_PORT=5555"
- SE_EVENT_BUS_HOST=${SE_HUB_IP}
- "SE_EVENT_BUS_PUBLISH_PORT=4442"
This is my working setup with proxy and chrome-node in Docker. Docker containers: selenium/node-chrome:115.0 selenium/hub:4.11
from seleniumwire import webdriver PROXY_IP = 'your proxy ip' PROXY_PORT = 'your proxy port' w_options = { 'addr': "0.0.0.0", 'auto_config': False, 'port': 8087, # proxy settings 'proxy': { 'http': f'http://{PROXY_IP}:{PROXY_PORT}', 'https': f'https://{PROXY_IP}:{PROXY_PORT}' } } chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--proxy-server=host.docker.internal:8087') chrome_options.add_argument('--ignore-certificate-errors') driver = webdriver.Remote( command_executor="http://localhost:4444/wd/hub", # docker selenium-hub address options=chrome_options, seleniumwire_options=sw_options )
Maybe I should open the PROXY_PORT in the docker-compose.yml config, so that the http request can go through?
'addr': "0.0.0.0", 'auto_config': False, 'port': 8087,
what is port: 8087, what is it for? @diyoyo
'addr': "0.0.0.0", 'auto_config': False, 'port': 8087,what is port: 8087, what is it for? @diyoyo
I have no clue @danztensai , I just tried to use the code previous posted ๐
what is port: 8087, what is it for? @diyoyo
This is the port for the seleniumwire proxy. You can choose any other, the main thing is that it is free
do I have to expose the port in the docker? @kaletvintsev