/electron-microscope

use electron-microscope to inspect websites and extract data

Primary LanguageJavaScript

electron-microscope

Use electron to load websites and extract data. Intended for automation, testing, web scraping, etc.

Loads URLs inside an electron webview tag, allows you to execute code on them and stream data from the pages back to your main process.

Run this headlessly on Linux using xvfb-run.

Please note this is intended to be a fairly low level library that tries to not add much on top of what Electron is doing under the hood, so things that you might think are simple to do can turn out to be relatively complex due to the way web browser events end up working.

usage

Use this in an electron app:

var electron = require('electron')
var createMicroscope = require('electron-microscope')

electron.app.on('ready', function () {
  createMicroscope(function (err, scope) {
    if (err) throw err
    // use your new microscope
  })
}) 

Run it with electron:

$ npm install electron-prebuilt -g
$ electron my-code.js

examples

See the test/ and examples/ folders

API

require('electron-microscope')(options, ready)

Requiring the module returns a constructor function that you use to create a new instance. Pass it an options object and a ready callback that will be called with (error, scope). scope is your new instance all ready to go.

scope.window

The electon BrowserWindow instance, AKA the renderer, which contains the <webview> that pages are loaded in.

Currently because there are three node processes at play (main, renderer, webview), to access webview APIs you have to go through the window, e.g.:

scope.window.webContents.executeJavaScript("document.querySelector('webview').goBack()")

scope.loadURL(url, cb)

Load a url, and call cb with (err) when loading is done. If there was a problem loading the page err will be the error, otherwise it means it loaded successfully

var outputStream = scope.run(code)

Run code on the currently loaded page. Run this after calling loadURL. Code must be a string, if it is a function then .toString() will be called on it. scope.run returns a readable stream that emits data generated by your code.

Uses the webview.executeJavascript electron API, which doesn't provide an error handling mechamism. Electron microscope wraps your code in a try/catch and if an error occurs it will be emitted on the stream. However if you have a syntax error it will likely not catch it so it may appear nothing is happening.

You code must be a function that has this template:

function (send, done) {
  // put your custom code here
  // call 'send(data)' to write data to the stream
  // call 'done()' to end the stream
  // calling send is optional, but you must eventually call done to end the stream
}

For example:

var code = `function (send, done) {
  for (var i = 0; i < 5; i++) send(i)
  done()
}`

var output = scope.run(code)

output.on('data', function (data) {
  // will get called for every time send is called above
  // data will be the value passed to send
  // in this case 5 times: 1, 2, 3, 4, 5
})  

output.on('error', function (error) {
  // will get called if your code throws an exception
  // error will be an object with .message and .stack from the thrown error object
})

scope.on('will-navigate', cb)

Emitted the page wants to start navigation. It can happen when the window.location object is changed or a link is clicked in the page.

Calls cb with (url), forwarded from this event.

scope.on('did-finish-load', cb)

This event is like did-finish-load, but fired when the load failed or was cancelled.

Calls cb with no arguments, forwarded from this event.

scope.on('did-fail-load', cb)

This event is like did-finish-load, but fired when the load failed or was cancelled.

Calls cb with (error), forwarded from this event.

scope.on('did-start-loading', cb)

Corresponds to the points in time when the spinner of the tab starts spinning.

Calls cb with no arguments, forwarded from this event.

scope.on('did-stop-loading', cb)

Corresponds to the points in time when the spinner of the tab stops spinning.

Calls cb with no arguments, forwarded from this event.

scope.destroy()

Call when you don't want to use the scope anymore. Causes the browser-window elecron-microscope uses internally to close, which may cause your electron app to exit if you do not have any other active windows.