Selenium Wire extends Selenium's Python bindings to give you access to the underlying requests made by the browser. You author your code in the same way as you do with Selenium, but you get extra APIs for inspecting requests and responses and making changes to them on the fly.
from seleniumwire import webdriver # Import from seleniumwire
# Create a new instance of the Chrome driver
driver = webdriver.Chrome()
# Go to the Google home page
driver.get('https://www.google.com')
# Access requests via the `requests` attribute
for request in driver.requests:
if request.response:
print(
request.url,
request.response.status_code,
request.response.headers['Content-Type']
)
Prints:
https://www.google.com/ 200 text/html; charset=UTF-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_120x44dp.png 200 image/png
https://consent.google.com/status?continue=https://www.google.com&pc=s×tamp=1531511954&gl=GB 204 text/html; charset=utf-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png 200 image/png
https://ssl.gstatic.com/gb/images/i2_2ec824b0.png 200 image/png
https://www.google.com/gen_204?s=webaft&t=aft&atyp=csi&ei=kgRJW7DBONKTlwTK77wQ&rt=wsrt.366,aft.58,prt.58 204 text/html; charset=UTF-8
...
- Pure Python, user-friendly API
- HTTP and HTTPS requests captured
- Intercept requests and responses
- Modify headers, parameters, body content on the fly
- Capture websocket messages
- HAR format supported
- Proxy server support
- Python 3.6+
- Selenium 3.4.0+
- Chrome, Firefox and Remote Webdriver supported
- Installation
- Creating the Webdriver
- Accessing Requests
- Request Objects
- Response Objects
- Intercepting Requests and Responses
- Limiting Request Capture
- Request Storage
- Proxies
- Bot Detection
- Certificates
- All Options
- License
Install using pip:
pip install selenium-wire
If you get an error about not being able to build cryptography you may be running an old version of pip. Try upgrading pip with python -m pip install --upgrade pip
and then re-run the above command.
No specific configuration should be necessary except to ensure that you have downloaded the ChromeDriver and GeckoDriver for Chrome and Firefox to be remotely controlled, the same as if you were using Selenium directly. Once downloaded, these executables should be placed somewhere on your PATH.
Selenium Wire requires OpenSSL for decrypting HTTPS requests. This is normally already installed on most systems, but if it's not you can install it with:
Linux
# For apt based Linux systems
sudo apt install openssl
# For RPM based Linux systems
sudo yum install openssl
# For Linux alpine
sudo apk add openssl
MacOS
brew install openssl
Windows
No installation is required.
Ensure that you import webdriver
from the seleniumwire
package:
from seleniumwire import webdriver
For sub-packages of webdriver
, you should continue to import these directly from selenium
. For example, to import WebDriverWait
:
# Sub-packages of webdriver must still be imported from `selenium` itself
from selenium.webdriver.support.ui import WebDriverWait
Chrome and Firefox
For Chrome and Firefox you don't need to do anything special. Just instantiate the webdriver as you would normally with webdriver.Chrome()
or webdriver.Firefox()
passing in any desired capabilities and browser specific options for Chrome or Firefox , such as the executable path, headless mode etc. Selenium Wire also has it's own options that can be passed in the seleniumwire_options
attribute.
Remote
Selenium Wire has limited support for using the remote webdriver client. When you create an instance of the remote webdriver, you need to specify the hostname or IP address of the machine (or container) running Selenium Wire. This allows the remote instance to communicate back to Selenium Wire with its requests and responses.
options = {
'addr': 'hostname_or_ip' # Address of the machine running Selenium Wire. Explicitly use 127.0.0.1 rather than localhost if remote session is running locally.
}
driver = webdriver.Remote(
command_executor='http://www.example.com',
seleniumwire_options=options
)
If the machine running the browser needs to use a different address to talk to the machine running Selenium Wire you need to configure the browser manually. This issue goes into more detail.
Selenium Wire captures all HTTP/HTTPS traffic made by the browser1. The following attributes provide access to requests and responses.
driver.requests
The list of captured requests in chronological order.
driver.last_request
Convenience attribute for retrieving the most recently captured request. This is more efficient than using
driver.requests[-1]
.driver.wait_for_request(pat, timeout=10)
This method will wait until it sees a request matching a pattern. The
pat
attribute will be matched within the request URL.pat
can be a simple substring or a regular expression. Note thatdriver.wait_for_request()
doesn't make a request, it just waits for a previous request made by some other action and it will return the first request it finds. Also note that sincepat
can be a regular expression, you must escape special characters such as question marks with a slash. ATimeoutException
is raised if no match is found within the timeout period.For example, to wait for an AJAX request to return after a button is clicked:
# Click a button that triggers a background request to https://server/api/products/12345/ button_element.click() # Wait for the request/response to complete request = driver.wait_for_request('/api/products/12345/')
driver.har
A JSON formatted HAR archive of HTTP transactions that have taken place. HAR capture is turned off by default and you must set the
enable_har
option toTrue
before usingdriver.har
.driver.iter_requests()
Returns an iterator over captured requests. Useful when dealing with a large number of requests.
driver.request_interceptor
Used to set a request interceptor. See Intercepting Requests and Responses.
driver.response_interceptor
Used to set a response interceptor.
Clearing Requests
To clear previously captured requests and HAR entries, use del
:
del driver.requests
Request objects have the following attributes.
body
The request body as
bytes
. If the request has no body the value ofbody
will be empty, i.e.b''
.cert
Information about the server SSL certificate in dictionary format. Empty for non-HTTPS requests.
date
The datetime the request was made.
headers
A dictionary-like object of request headers. Headers are case-insensitive and duplicates are permitted. Asking for
request.headers['user-agent']
will return the value of theUser-Agent
header. If you wish to replace a header, make sure you delete the existing header first withdel request.headers['header-name']
, otherwise you'll create a duplicate.host
The request host, e.g.
wwww.example.com
method
The HTTP method, e.g.
GET
orPOST
etc.params
A dictionary of request parameters. If a parameter with the same name appears more than once in the request, it's value in the dictionary will be a list.
path
The request path, e.g.
/some/path/index.html
querystring
The query string, e.g.
foo=bar&spam=eggs
response
The response object associated with the request. This will be
None
if the request has no response.url
The request URL, e.g.
https://server/some/path/index.html?foo=bar&spam=eggs
ws_messages
Where the request is a websocket handshake request (normally with a URL starting
wss://
) thenws_messages
will contain a list of any websocket messages sent and received. See WebSocketMessage Objects.
Request objects have the following methods.
abort(error_code=403)
Trigger immediate termination of the request with the supplied error code. For use within request interceptors. See Example: Block a request.
create_response(status_code, headers=(), body=b'')
Create a response and return it without sending any data to the remote server. For use within request interceptors. See Example: Mock a response.
These objects represent websocket messages sent between the browser and server and vice versa. They are held in a list by request.ws_messages
on websocket handshake requests. They have the following attributes.
content
The message content which may be either
str
orbytes
.date
The datetime of the message.
from_client
True
when the message was sent by the client andFalse
when sent by the server.
Response objects have the following attributes.
body
The response body as
bytes
. If the response has no body the value ofbody
will be empty, i.e.b''
. Sometimes the body may have been compressed by the server. You can prevent this with thedisable_encoding
option. To manually decode an encoded response body you can do:
from seleniumwire.utils import decode
body = decode(response.body, response.headers.get('Content-Encoding', 'identity'))
date
The datetime the response was received.
headers
A dictionary-like object of response headers. Headers are case-insensitive and duplicates are permitted. Asking for
response.headers['content-length']
will return the value of theContent-Length
header. If you wish to replace a header, make sure you delete the existing header first withdel response.headers['header-name']
, otherwise you'll create a duplicate.reason
The reason phrase, e.g.
OK
orNot Found
etc.status_code
The status code of the response, e.g.
200
or404
etc.
As well as capturing requests and responses, Selenium Wire allows you to modify them on the fly using interceptors. An interceptor is a function that gets invoked with requests and responses as they pass through Selenium Wire. Within an interceptor you can modify the request and response as you see fit.
You set your interceptor functions using the driver.request_interceptor
and driver.response_interceptor
attributes before you start using the driver. A request interceptor should accept a single argument for the request. A response interceptor should accept two arguments, one for the originating request and one for the response.
def interceptor(request):
request.headers['New-Header'] = 'Some Value'
driver.request_interceptor = interceptor
driver.get(...)
# All requests will now contain New-Header
How can I check that a header has been set correctly? You can print the headers from captured requests after the page has loaded (using driver.requests
), or alternatively point the webdriver at https://httpbin.org/headers which will echo the request headers back to the browser so you can view them.
Duplicate header names are permitted in an HTTP request, so before setting the replacement header you must first delete the existing header using del
like in the following example, otherwise two headers with the same name will exist (request.headers
is a special dictionary-like object that allows duplicates).
def interceptor(request):
del request.headers['Referer'] # Remember to delete the header first
request.headers['Referer'] = 'some_referer' # Spoof the referer
driver.request_interceptor = interceptor
driver.get(...)
# All requests will now use 'some_referer' for the referer
def interceptor(request, response): # A response interceptor takes two args
if request.url == 'https://server.com/some/path':
response.headers['New-Header'] = 'Some Value'
driver.response_interceptor = interceptor
driver.get(...)
# Responses from https://server.com/some/path will now contain New-Header
Request parameters work differently to headers in that they are calculated when they are set on the request. That means that you first have to read them, then update them, and then write them back - like in the following example. Parameters are held in a regular dictionary, so parameters with the same name will be overwritten.
def interceptor(request):
params = request.params
params['foo'] = 'bar'
request.params = params
driver.request_interceptor = interceptor
driver.get(...)
# foo=bar will be added to all requests
import json
def interceptor(request):
if request.method == 'POST' and request.headers['Content-Type'] == 'application/json':
# The body is in bytes so convert to a string
body = request.body.decode('utf-8')
# Load the JSON
data = json.loads(body)
# Add a new property
data['foo'] = 'bar'
# Set the JSON back on the request
request.body = json.dumps(data).encode('utf-8')
# Update the content length
del request.headers['Content-Length']
request.headers['Content-Length'] = str(len(request.body))
driver.request_interceptor = interceptor
driver.get(...)
You can use request.abort()
to block a request and send an immediate response back to the browser. An optional error code can be supplied. The default is 403 (forbidden).
def interceptor(request):
# Block PNG, JPEG and GIF images
if request.path.endswith(('.png', '.jpg', '.gif')):
request.abort()
driver.request_interceptor = interceptor
driver.get(...)
# Requests for PNG, JPEG and GIF images will result in a 403 Forbidden
You can use request.create_response()
to send a custom reply back to the browser. No data will be sent to the remote server.
def interceptor(request):
if request.url == 'https://server.com/some/path':
request.create_response(
status_code=200,
headers={'Content-Type': 'text/html'}, # Optional headers dictionary
body='<html>Hello World!</html>' # Optional body
)
driver.request_interceptor = interceptor
driver.get(...)
# Requests to https://server.com/some/path will have their responses mocked
Have any other examples you think could be useful? Feel free to submit a PR.
To unset an interceptor, use del
:
del driver.request_interceptor
del driver.response_interceptor
Selenium Wire works by redirecting browser traffic through an internal proxy server it spins up in the background. As requests flow through the proxy they are intercepted and captured. Capturing requests can slow things down a little but there are a few things you can do to restrict what gets captured.
driver.scopes
This accepts a list of regular expressions that will match the URLs to be captured. It should be set on the driver before making any requests. When empty (the default) all URLs are captured.
driver.scopes = [ '.*stackoverflow.*', '.*github.*' ] driver.get(...) # Start making requests # Only request URLs containing "stackoverflow" or "github" will now be captured
Note that even if a request is out of scope and not captured, it will still travel through Selenium Wire.
seleniumwire_options.disable_capture
Use this option to switch off request capture. Requests will still pass through Selenium Wire and through any upstream proxy you have configured but they won't be intercepted or stored. Request interceptors will not execute.
options = { 'disable_capture': True # Don't intercept/store any requests } driver = webdriver.Chrome(seleniumwire_options=options)
seleniumwire_options.exclude_hosts
Use this option to bypass Selenium Wire entirely. Any requests made to addresses listed here will go direct from the browser to the server without involving Selenium Wire. Note that if you've configured an upstream proxy then these requests will also bypass that proxy.
options = { 'exclude_hosts': ['host1.com', 'host2.com'] # Bypass Selenium Wire for these hosts } driver = webdriver.Chrome(seleniumwire_options=options)
request.abort()
You can abort a request early by using
request.abort()
from within a request interceptor. This will send an immediate response back to the client without the request travelling any further. You can use this mechanism to block certain types of requests (e.g. images) to improve page load performance.def interceptor(request): # Block PNG, JPEG and GIF images if request.path.endswith(('.png', '.jpg', '.gif')): request.abort() driver.request_interceptor = interceptor driver.get(...) # Start making requests
Captured requests and responses are stored in the system temp folder by default (that's /tmp
on Linux and usually C:\Users\<username>\AppData\Local\Temp
on Windows) in a sub-folder called .seleniumwire
. To change where the .seleniumwire
folder gets created you can use the request_storage_base_dir
option:
options = {
'request_storage_base_dir': '/my/storage/folder' # .seleniumwire will get created here
}
driver = webdriver.Chrome(seleniumwire_options=options)
Selenium Wire also supports storing requests and responses in memory only, which may be useful in certain situations - e.g. if you're running short lived Docker containers and don't want the overhead of disk persistence. You can enable in-memory storage by setting the request_storage
option to memory
:
options = {
'request_storage': 'memory' # Store requests and responses in memory only
}
driver = webdriver.Chrome(seleniumwire_options=options)
If you're concerned about the amount of memory that may be consumed, you can restrict the number of requests that are stored with the request_storage_max_size
option:
options = {
'request_storage': 'memory',
'request_storage_max_size': 100 # Store no more than 100 requests in memory
}
driver = webdriver.Chrome(seleniumwire_options=options)
When the max size is reached, older requests are discarded as newer requests arrive. Keep in mind that if you restrict the number of requests being stored, requests may have disappeared from storage by the time you come to retrieve them with driver.requests
or driver.wait_for_request()
etc.
If the site you are accessing sits behind a proxy server you can tell Selenium Wire about that proxy server in the options you pass to the webdriver.
The configuration takes the following format:
options = {
'proxy': {
'http': 'http://192.168.10.100:8888',
'https': 'https://192.168.10.100:8888',
'no_proxy': 'localhost,127.0.0.1'
}
}
driver = webdriver.Chrome(seleniumwire_options=options)
To use HTTP Basic Auth with your proxy, specify the username and password in the URL:
options = {
'proxy': {
'https': 'https://user:pass@192.168.10.100:8888',
}
}
For authentication other than Basic, you can supply the full value for the Proxy-Authorization
header using the custom_authorization
option. For example, if your proxy used the Bearer scheme:
options = {
'proxy': {
'https': 'https://192.168.10.100:8888', # No username or password used
'custom_authorization': 'Bearer mytoken123' # Custom Proxy-Authorization header value
}
}
More info on the Proxy-Authorization
header can be found here.
The proxy configuration can also be loaded through environment variables called HTTP_PROXY
, HTTPS_PROXY
and NO_PROXY
:
$ export HTTP_PROXY="http://192.168.10.100:8888"
$ export HTTPS_PROXY="https://192.168.10.100:8888"
$ export NO_PROXY="localhost,127.0.0.1"
Using a SOCKS proxy is the same as using an HTTP based one but you set the scheme to socks5
:
options = {
'proxy': {
'http': 'socks5://user:pass@192.168.10.100:8888',
'https': 'socks5://user:pass@192.168.10.100:8888',
'no_proxy': 'localhost,127.0.0.1'
}
}
driver = webdriver.Chrome(seleniumwire_options=options)
You can leave out the user
and pass
if your proxy doesn't require authentication.
As well as socks5
, the schemes socks4
and socks5h
are supported. Use socks5h
when you want DNS resolution to happen on the proxy server rather than on the client.
Using Selenium Wire with Tor
See this example if you want to run Selenium Wire with Tor.
If you want to change the proxy settings for an existing driver instance, use the driver.proxy
attribute:
driver.get(...) # Using some initial proxy
# Change the proxy
driver.proxy = {
'https': 'https://user:pass@192.168.10.100:8888',
}
driver.get(...) # These requests will use the new proxy
To clear a proxy, set driver.proxy
to an empty dict {}
.
This mechanism also supports the no_proxy
and custom_authorization
options.
Selenium Wire will integrate with undetected-chromedriver if it finds it in your environment. This library will transparently modify ChromeDriver to prevent it from triggering anti-bot measures on websites.
If you wish to take advantage of this make sure you have undetected_chromedriver installed:
pip install undetected-chromedriver
Then in your code, import the seleniumwire.undetected_chromedriver
package:
import seleniumwire.undetected_chromedriver as uc
chrome_options = uc.ChromeOptions()
driver = uc.Chrome(
options=chrome_options,
seleniumwire_options={}
)
Selenium Wire uses it's own root certificate to decrypt HTTPS traffic. It is not normally necessary for the browser to trust this certificate because Selenium Wire tells the browser to add it as an exception. This will allow the browser to function normally, but it will display a "Not Secure" message (and/or unlocked padlock) in the address bar. If you wish to get rid of this message you can install the root certificate manually.
You can download the root certificate here. Once downloaded, navigate to "Certificates" in your browser settings and import the certificate in the "Authorities" section.
If you would like to use your own root certificate you can supply the path to the certificate and the private key using the ca_cert
and ca_key
options.
If you do specify your own certificate, be sure to manually delete Selenium Wire's temporary storage folder. This will clear out any existing certificates that may have been cached from previous runs.
A summary of all options that can be passed to Selenium Wire via the seleniumwire_options
webdriver attribute.
addr
The IP address or hostname of the machine running Selenium Wire. This defaults to 127.0.0.1. You may want to change this to the public IP of the machine (or container) if you're using the remote webdriver.
options = {
'addr': '192.168.0.10' # Use the public IP of the machine
}
driver = webdriver.Chrome(seleniumwire_options=options)
auto_config
Whether Selenium Wire should auto-configure the browser for request capture.
True
by default.ca_cert
The path to a root (CA) certificate if you prefer to use your own certificate rather than use the default.
options = {
'ca_cert': '/path/to/ca.crt' # Use own root certificate
}
driver = webdriver.Chrome(seleniumwire_options=options)
ca_key
The path to the private key if you're using your own root certificate. The key must always be supplied when using your own certificate.
options = {
'ca_key': '/path/to/ca.key' # Path to private key
}
driver = webdriver.Chrome(seleniumwire_options=options)
disable_capture
Disable request capture. When
True
nothing gets intercepted or stored.False
by default.
options = {
'disable_capture': True # Don't intercept/store any requests.
}
driver = webdriver.Chrome(seleniumwire_options=options)
disable_encoding
Ask the server to send back uncompressed data.
False
by default. WhenTrue
this sets theAccept-Encoding
header toidentity
for all outbound requests. Note that it won't always work - sometimes the server may ignore it.
options = {
'disable_encoding': True # Ask the server not to compress the response
}
driver = webdriver.Chrome(seleniumwire_options=options)
enable_har
When
True
a HAR archive of HTTP transactions will be kept which can be retrieved withdriver.har
.False
by default.
options = {
'enable_har': True # Capture HAR data, retrieve with driver.har
}
driver = webdriver.Chrome(seleniumwire_options=options)
exclude_hosts
A list of addresses for which Selenium Wire should be bypassed entirely. Note that if you have configured an upstream proxy then requests to excluded hosts will also bypass that proxy.
options = {
'exclude_hosts': ['google-analytics.com'] # Bypass these hosts
}
driver = webdriver.Chrome(seleniumwire_options=options)
ignore_http_methods
A list of HTTP methods (specified as uppercase strings) that should be ignored by Selenium Wire and not captured. The default is
['OPTIONS']
which ignores all OPTIONS requests. To capture all request methods, setignore_http_methods
to an empty list:
options = {
'ignore_http_methods': [] # Capture all requests, including OPTIONS requests
}
driver = webdriver.Chrome(seleniumwire_options=options)
port
The port number that Selenium Wire's backend listens on. You don't normally need to specify a port as a random port number is chosen automatically.
options = {
'port': 9999 # Tell the backend to listen on port 9999 (not normally necessary to set this)
}
driver = webdriver.Chrome(seleniumwire_options=options)
proxy
The upstream proxy server configuration if you're using a proxy.
options = {
'proxy': {
'http': 'http://user:pass@192.168.10.100:8888',
'https': 'https://user:pass@192.168.10.100:8889',
'no_proxy': 'localhost,127.0.0.1'
}
}
driver = webdriver.Chrome(seleniumwire_options=options)
request_storage
The type of storage to use. Selenium Wire defaults to disk based storage, but you can switch to in-memory storage by setting this option to
memory
:
options = {
'request_storage': 'memory' # Store requests and responses in memory only
}
driver = webdriver.Chrome(seleniumwire_options=options)
request_storage_base_dir
The base location where Selenium Wire stores captured requests and responses when using its default disk based storage. This defaults to the system temp folder (that's
/tmp
on Linux and usuallyC:\Users\<username>\AppData\Local\Temp
on Windows). A sub-folder called.seleniumwire
will get created here to store the captured data.
options = {
'request_storage_base_dir': '/my/storage/folder' # .seleniumwire will get created here
}
driver = webdriver.Chrome(seleniumwire_options=options)
request_storage_max_size
The maximum number of requests to store when using in-memory storage. Unlimited by default. This option currently has no effect when using the default disk based storage.
options = {
'request_storage': 'memory',
'request_storage_max_size': 100 # Store no more than 100 requests in memory
}
driver = webdriver.Chrome(seleniumwire_options=options)
suppress_connection_errors
Whether to suppress connection related tracebacks.
True
by default, meaning that harmless errors that sometimes occur at browser shutdown do not alarm users. When suppressed, the connection error message is logged at DEBUG level without a traceback. Set toFalse
to allow exception propagation and see full tracebacks.
options = {
'suppress_connection_errors': False # Show full tracebacks for any connection errors
}
driver = webdriver.Chrome(seleniumwire_options=options)
verify_ssl
Whether SSL certificates should be verified.
False
by default, which prevents errors with self-signed certificates.
options = {
'verify_ssl': True # Verify SSL certificates but beware of errors with self-signed certificates
}
driver = webdriver.Chrome(seleniumwire_options=options)
MIT