/PlaywrightCapture

Capture a URL with Playwright

Primary LanguagePythonOtherNOASSERTION

Playwright Capture

Simple replacement for splash using playwright.

Install

pip install playwrightcapture

Usage

A very basic example:

from playwrightcapture import Capture

async with Capture() as capture:
    await capture.initialize_context()
    entries = await capture.capture_page(url, max_depth_capture_time=90)

Entries is a dictionaries that contains (if all goes well) the HAR, the screenshot, all the cookies of the session, the URL as it is in the browser at the end of the capture, and the full HTML page as rendered.

reCAPTCHA bypass

No blackmagic, it is just a reimplementation of a well known technique as implemented there, and there.

This modules will try to bypass reCAPTCHA protected websites if you install it this way:

pip install playwrightcapture[recaptcha]

This will install requests, pydub and SpeechRecognition. In order to work, pydub requires ffmpeg or libav, look at the install guide for more details. SpeechRecognition uses the Google Speech Recognition API to turn the audio file into text (I hope you appreciate the irony).