This step-by-step tutorial demonstrates how to use Playwright to bypass CAPTCHA challenges using Python. The tutorial will also discuss the perks of using Oxylabs’ Web Unblocker instead of the playwright-stealth library.
- 1. Install dependencies
- 2. Import modules
- 3. Create a headless browser instance
- 4. Apply the stealth settings
- 6. Take a screenshot
- 7. Execute and test
Install the Playwright library and the stealth package.
pip install playwright playwright-stealth
Use the synchronous version of the Playwright library for a straightforward and linear program flow.
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
Define the capture_screenshot() function that encapsulates the whole code to open a headless browser instance, visit the url, and capture the screenshot. In this function, create a new sync_playwright instance and then use it to launch the Chromium browser in headless mode.
# Define the function to capture the screenshot
def capture_screenshot():
# Create a playwright instance
with sync_playwright() as play_wright:
browser = play_wright.chromium.launch(headless=True)
# Create a new context and page
context = browser.new_context()
page = context.new_page()
After creating the browser context, enable Playwright CAPTCHA bypasses by applying the stealth settings to the page using the playwright-stealth package. Stealth settings help in reducing the chances of automated access detection by hiding the browsers’ automated behavior.
# Apply the stealth settings
stealth_sync(page)
- Navigate to the page
In the next step, navigate to the target URL by specifying your required URL and navigating to it using the
goto()page method.
# Navigate to the website
url = "http://sandbox.oxylabs.io/products"
page.goto(url)
Wait for the page to load completely, take the screenshot, and close the browser.
# Wait for the webpage to load completely
page.wait_for_load_state("load")
# Take a screenshot
screenshot_filename = "oxylabs_screenshot.png"
page.screenshot(path=screenshot_filename)
# Close the browser
browser.close()
print("Done! You can check the screenshot...")
capture_screenshot()
Here is what our complete code looks like:
# Import the required modules
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
# Define the function to capture the screenshot
def capture_screenshot():
# Create a playwright instance
with sync_playwright() as play_wright:
browser = play_wright.chromium.launch(headless=True)
# Create a new context and page
context = browser.new_context()
page = context.new_page()
# Apply the stealth settings
stealth_sync(page)
# Navigate to the website
url = "http://sandbox.oxylabs.io/products"
page.goto(url)
# Wait for the webpage to load completely
page.wait_for_load_state("load")
# Take a screenshot
screenshot_filename = "oxylabs_screenshot.png"
page.screenshot(path=screenshot_filename)
# Close the browser
browser.close()
print("Done! You can check the screenshot...")
capture_screenshot()
Note: Executing the code saves the screenshot.
Oxylabs’ Web Unblocker employs AI techniques to help you access publicly available information behind the CAPTCHA. You just need to send a simple query and Web Unblocker will automatically choose the fastest CAPTCHA proxy, attach all essential headers, and return the response HTML bypassing any anti-bots of the target websites.
To use Web Unblocker, you'll need an active subscription. You can either get a paid plan or a 7-day free trial here.
After successfully creating your account, you can set your API key username and password from the dashboard. These API key credentials will be used later in the code.
You should use a library that can help perform HTTP requests. We will use the requests to send HTTP requests to Web Unblocker API and capture the response.
pip install requests
In your Python script file, import the modules using the following import statement:
import requests
Create the proxies dictionary to connect to Web Unblocker and then define the headers dictionary that’ll instruct Web Unblocker to use JavaScript rendering. See the documentation for more details.
# Define proxy dict. Don't forget to pass your Web Unblocker credentials (username and password)
proxies = {
"http": "http://USERNAME:PASSWORD@unblock.oxylabs.io:60000",
"https": "http://USERNAME:PASSWORD@unblock.oxylabs.io:60000",
}
headers = {
"X-Oxylabs-Render": "html"
}
Perform your request by specifying the URL, request type, and proxy by using the following code.
response = requests.request(
"GET",
"http://sandbox.oxylabs.io/products",
verify=False, # Ignore the certificate
proxies=proxies,
)
Write the following code to print the response and save it in an HTML file.
# Print result page to stdout
print(response.text)
# Save returned HTML to result.html file
with open("result.html", "w") as f:
f.write(response.text)
Execute the code and test the output. If the output HTML file has actual page contents, the script successfully bypassed the CAPTCHA. Here is what our complete code looks like.
# Import the modules
import requests
# Define proxy dict. Don't forget to put your real user and pass here as well.
proxies = {
"http": "http://USERNAME:PASSWORD@unblock.oxylabs.io:60000",
"https": "http://USERNAME:PASSWORD@unblock.oxylabs.io:60000",
}
headers = {
"X-Oxylabs-Render": "html"
}
response = requests.request(
"GET",
"http://sandbox.oxylabs.io/products",
verify=False, # Ignore the certificate
proxies=proxies,
headers=headers,
)
# Print result page to stdout
print(response.text)
# Save returned HTML to result.html file
with open("result.html", "w") as f:
f.write(response.text)
And that's it! For a more detailed tutorial with images, you can check out this article.
