Mojo-Chrome - A Mojo interface to Chrome DevTools Protocol
# This is the example from https://medium.com/@lagenar/using-headless-chrome-via-the-websockets-interface-5f498fb67e0f
# of fetching the news headline from Google News. It should not be used as anything but an example.
# It is archived at https://web.archive.org/web/20171020022803/https://medium.com/@lagenar/using-headless-chrome-via-the-websockets-interface-5f498fb67e0f
use Mojo::Base -strict;
use Mojo::Chrome;
use Mojo::IOLoop;
binmode(STDOUT, ":utf8");
$|++;
my $chrome = Mojo::Chrome->new->catch(sub{ warn pop });
my $url = 'https://news.google.com/news/?ned=us&hl=en';
Mojo::IOLoop->delay(
sub { $chrome->load_page($url, shift->begin) },
sub {
my ($delay, $err) = @_;
die $err if $err;
$chrome->evaluate(<<' JS', $delay->begin);
var sel = '[role="heading"][aria-level="2"]';
var headings = document.querySelectorAll(sel);
[].slice.call(headings).map((link)=>{return link.innerText});
JS
},
sub {
my ($delay, $err, $result) = @_;
die Mojo::Util::dumper $err if $err;
say for @$result;
}
)->catch(sub{ warn pop })->wait;
Mojo::Chrome is an interface to the Chrome DevTools Protocol which allows interaction with a (possibly headless) chrome instance. While Mojo::Chrome is primarily intended as a backbone for Test::Mojo::Role::Chrome, this is not its only purpose.
Communication is bidirectional and asynchronous via an internal websocket. Both request/response and push-events are commonplace, though this module does its best to simplify things.
This module is the spiritual successor to Mojo::Phantom which interfaced with the headless phantomjs application. That project was abandoned after the headless chrome functionality was announced.
Mojo::Phantom had many short-cuts that were intended to smooth out the experience since communication was essentially unidirectional after the page load and the process or at least the page state was ephemeral. Because of the robust communication afforded by the Chrome DevTools Protocol many of those short-cuts will not be replicated for Mojo::Chome
. However with the increased power the author suspects that new short-cuts will be desirable, suggestions are welcome.
WARNING: Until released to CPAN this module is considered pre-alpha and absolutely no support or stability is promised. Not even what follows in this section of the documentation, which will only apply upon the first CPAN release.
This module is new and changes may occur. High level functionality should be fairly stable.
The protocol itself is fairly new and largely undocumented, especially in usage documentation. If this module skews from the protocol in newer versions of chrome please alert the author via the bug tracker. Incompatibilites can hopefully be smoothed out in the module however where this isn't possible the author intends to target newer versions of chrome rather than support a long tail of chrome version.
Errors are basically the wild west. While methods should have error slots where errors should arrive, whether they do or not is up in the air. This is especially true of errors that eminate from within the protocol itself. Certainly this will need to be improved but it is difficult with the protocol documentation in its current state. Pull requests and other constructive comments are always welcome.
This module attempts to connect and/or reconnect to Chrome's DevTools Protocol and even spawn an instance of Chrome so as to make that as seemless as possible to the user. Any method that sends a command will first check for a connection and if it doesn't exist attempt to create one. Further if a connection can't be made or if a port to connect on hasn't been specified it will spawn a new instance. In the case that no port was specified a random free port will be used. (Note that an additional randomly selected free port is used during startup and is then dropped once the startup is complete.)
All this should be as transparent and "do what I mean" as possible.
Mojo::Chrome inherits all of the events from Mojo::EventEmitter. Further it emits events that arrive from the protocol as they arrive. Per the protocol most events are disabled initially, though some methods will enable and subscribe to events as a matter of course.
Eventually this documentation might suggest best practices or contain other functionality to moderate events. For the time being simply consider that fact, especially when disabling protocol events.
Mojo::Chrome inherits all of the attributes from Mojo::EventEmitter and implements the following new ones.
An array reference of command line arguments passed to the "executable" if a chrome process is spawned. Therefore the default contains only --headless
. A useful option to consider is --disable-gpu
which is not enabled by default. Note that --remote_debugging_port
should not be given, use the "target"'s port value instead.
A base url used to make relative urls absolute. Must be an instance of Mojo::URL or api compatible class.
The name of the chrome executable (if it is in the $PATH
) or an absolute path to the chrome executable. Default is to use "detect_chrome_executable" to discover it. If unset and not detectable, throws an exception when used.
The Mojo::Transaction object maintaining the websocket connection to chrome.
The Mojo::UserAgent object used to open the connection to chrome if necessary.
An instance of Mojo::URL (or api compatible class) used to contact a running process of chrome. If one is not specified a new chrome process will be spawned on a random port. If the port is specifed but cannot be contacted then a new chrome process will be spawned using that port. Default is http://127.0.0.1
.
my $path = Mojo::Chrome->detect_chrome_executable;
Returns the path of the chrome executable to be used. The following heuristic is used:
If the environment variable
MOJO_CHROME_EXECUTABLE
is set that is immediately returned, no check is performed.If an executable file named
google-chrome
exists in your PATH (as determined by "can_run" in IPC::Cmd) and is executable, then that path is returned.If the system is
darwin
(i.e. Mac), then if/Applications/Google Chrome.app/Contents/MacOS/Google Chrome
exists and is executable, then that path is returned.Otherwise returns
undef
.
Mojo::Chrome inherits all of the methods from Mojo::EventEmitter and implements the following new ones.
$chrome->evaluate('JS', sub { my ($chrome, $error, $value) = @_; ... });
Array.from(document.getElementsByTagName('p')).map(e => e.innerText);
JS
Evaluate a javascript snippet and return the result of the last statement. If passed a hash reference this is assumed to be arguments passed to DevTools' Runtime.evaluate. Otherwise the value is assumed to be the expression (and the returnByValue
option will be set to true). The callback will receive the invocant, any error, then the value of the last evaluated statement.
Note that other complex behaviors are possible when explicitly passing your own arguments, so please investigate those if this behavior seems limiting.
my $chrome = Mojo::Chrome->new->from_url($url);
A shortcut to use a string or Mojo::URL to set the arguments for this class (see also "new").
The scheme, host, and port portions set the "target" indicating where to connect to chrome's DevTools Protocol.
Query parameters are available to control the spawned chrome process. If given, the executable
parameter is used to set the "executable" otherwise the default is not changed.
All other parameters are interpreted as command line switches and used to set the "arguments". The parameter headless
is considered a default and is appended unless the parameter headless
or no-headless
is explicitly given. Note that no-headless
is not an official parameter but is added here to prevent the default of adding headless
. remote_debugging_port
should not be given, pass as the port part of the url instead.
$chrome->load_page($url, sub { my ($chrome, $error) = @_; ... });
Request a page and load the result, evaluating any initial javascript in the process. This subscribes to Page events and then requests the page with Page.navigate. It then invokes the callback when the appropriate Page.frameStoppedLoading event is caught.
If passed a hash reference this is assumed to the the arguments passed to the Page.navigate
method. Otherwise the value is assumed to the be url to load. If the url (given either way) is relative, it will be made absolute using the "base" url.
my $chrome = Mojo::Chrome->new(%attributes);
my $chrome = Mojo::Chrome->new(\%attributes);
my $chrome = Mojo::Chrome->new($url);
Construct a new instance of Mojo::Chrome. If given a single arugment which is not a hash reference that argument is passed to "from_url" to create an instance from a url. Otherwise the usual "new" in Mojo::Base behavior is followed.
$chrome->send_command($method, $params, sub { my ($chrome, $error, $result) = @_; ... });
A lower level method to send a command via the protocol. The arguments are a method and a hash reference of parameters. If given, a callback will be invoked when a response is received (N.B. issuing ids and watching for responses is handled transparently internally). The callback is passed the invocant, any error, and the result.
This method lets you interact with the protocol and while it does simplify some of that process it is still quite low level.
- https://chromedevtools.github.io/devtools-protocol
- https://developers.google.com/web/updates/2017/04/headless-chrome
http://github.com/jberger/Mojo-Chrome
Joel Berger, <joel.a.berger@gmail.com>
Copyright (C) 2017 by "AUTHOR" and "CONTRIBUTORS". This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.