
Node proxy server attempting to fetch readable contents from any provided URL.

Primary LanguageJavaScript


Build Status Dependency Status

Proxy server to retrieve a readable version of any provided url, powered by Node, PhantomJS and Readability.js.


$ git clone https://github.com/n1k0/readable-proxy
$ cd readable-proxy
$ npm install


Starts server on localhost:3000:

$ npm start

Note about CORS: by design, the server will allow any origin to access it, so browsers can consume it from pages hosted on a different domain.


By default, the proxy server will use the Readability.js version it ships with; to override this, you can set the READABILITY_LIB_PATH environment variable to the absolute path to the library file on your local system:

$ READABILITY_LIB_PATH=/path/to/my/own/version/of/Readability.js npm start


Web UI

Just head to http://localhost:3000/, enter some URL and start enjoying both original and readable renderings side by side.


The HTTP Rest API is available under /api.

Disclaimer: Truly REST implementation is probably far from being considered achieved.

GET /api/get

Required parameters
  • url: The URL to retrieve retrieve readable contents from, eg. https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/.
Optional parameters
  • sanitize: A boolean string to enable HTML sanitization (valid truthy boolean strings: "1", "on", "true", "yes", "y"; everything else will be considered falsy):
  • userAgent: A custom User Agent string. By default, it will use the PhantomJS one.

Note: Enabling contents sanitization loses Readability.js specific HTML semantics, though is probably safer for users if you plan to publish retrieved contents on a public website.


Content sanitization enabled:

$ curl\?sanitize=y&url\=https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/
  "byline":"Nicolas Perriault —",
  "content":"<p><strong>So finally you&#39;re <a href=\"https://nicolas.perriault.net/code/2013/testing-frontend-javascript-code-using-mocha-chai-and-sinon/\">testing",
  "title":"Get your Frontend JavaScript Code Covered | Code",
  "isProbablyReaderable": true

Content sanitization disabled (default):

$ curl\?url\=https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/
  "byline":"Nicolas Perriault —",
  "content":"<div id=\"readability-page-1\" class=\"page\"><section class=\"\">\n<p><strong>So finally you're…",
  "title":"Get your Frontend JavaScript Code Covered | Code",
  "isProbablyReaderable": true

Note: the isProbablyReaderable property tells if Readability has determined if page contents were parseable or not.

Usage from node

scrape() function

The scrape function scrapes a URL and returns a Promise with the JSON result object described above:

var scrape = require("readable-proxy").scrape;
var url = "https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/";

scrape(url, {sanitize: true, userAgent: "My custom User-Agent string"})


$ npm test


MPL 2.0.