pyppeteer2
Note: this is a continuation of the original, apparently abandoned pyppeteer project
Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation library.
- Free software: MIT license (including the work distributed under the Apache 2.0 license)
- Documentation: https://pyppeteer.github.io/pyppeteer2/
Installation
pyppeteer2 requires Python >= 3.6
Install with pip
from PyPI:
pip install pyppeteer2
Or install the latest version from this github repo:
pip install -U git+https://github.com/pyppeteer/pyppeteer2@dev
Usage
Note: When you run pyppeteer2 for the first time, it downloads the latest version of Chromium (~150MB) if it is not found on your system. If you don't prefer this behavior, ensure that a suitable Chrome binary is installed. One way to do this is to run
pyppeteer2-install
command before prior to using this library.
Full documentation can be found here. Puppeteer's documentation and its troubleshooting guide are also great resources for puppeteer2 users.
Examples
Open web page and take a screenshot:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
await page.goto('https://example.com')
await page.screenshot({'path': 'example.png'})
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
Evaluate javascript on a page:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
await page.goto('https://example.com')
await page.screenshot({'path': 'example.png'})
dimensions = await page.evaluate('''() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
}
}''')
print(dimensions)
# >>> {'width': 800, 'height': 600, 'deviceScaleFactor': 1}
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
Differences between puppeteer and pyppeteer2
pyppeteer2 strives to replicate the puppeteer API as close as possible, however, fundamental differences between Javascript and Python make this difficult to do precisely. More information on specifics can be found in the documentation.
Keyword arguments for options
puppeteer uses an object for passing options to functions/methods. pyppeteer2 methods/functions accept both dictionary (python equivalent to JavaScript's objects) and keyword arguments for options.
Dictionary style options (similar to puppeteer):
browser = await launch({'headless': True})
Keyword argument style options (more pythonic, isn't it?):
browser = await launch(headless=True)
Element selector method names
In python, $
is not a valid identifier. The equivalent methods to Puppeteer's $
, $$
, and $x
methods are listed below, along with some shorthand methods for your convenience:
puppeteer | pyppeteer2 | pyppeteer2 shorthand |
---|---|---|
Page.$() | Page.querySelector() | Page.J() |
Page.$$() | Page.querySelectorAll() | Page.JJ() |
Page.$x() | Page.xpath() | Page.Jx() |
Page.evaluate()
and Page.querySelectorEval()
Arguments of puppeteer's version of evaluate()
takes a JavaScript function or a string representation of a JavaScript expression. pyppeteer2 takes string representation of JavaScript expression or function. pyppeteer2 will try to automatically detect if the string is function or expression, but it will fail sometimes. If an expression is erroneously treated as function and an error is raised, try setting force_expr
to True
, to force pyppeteer2 to treat the string as expression.
Examples:
Get a page's textContent
:
content = await page.evaluate('document.body.textContent', force_expr=True)
Get an element's textContent
:
element = await page.querySelector('h1')
title = await page.evaluate('(element) => element.textContent', element)
Roadmap
See projects