phantomjs-html

The main goal of this project is to enable you to retrieve the HTML output of URLs. It also gives you some ExpressJS middlewares and methods to implement SEO for single page apps out of that.

Development state

This project is just a born baby. I tried the nice prerender but found it too complex for such a simple task. It also made our server crash a lot... So I made this one, which is a lot simpler and surely slower because it dosen't keep multiple instances of PhantomJS running.

How to use

var phantomjsHtml = require('phantomjs-html');

Exemples :

1. Get HTML from an URL

phantomjsHtml.getHTML('https://google.com', function(err, output){
  if(err){
    console.log('ERROR', err);
  } else {
    console.log('SUCCESS', output);
  }
});

1. Use the ExpressJS SEO middleware

This middleware need to be used before any routing is done. It will detect if the current request is made from a crawler using the user-agent and _escaped_fragment_. If a crawler is detected, it will return a PhantomJS rendering of the page that crawlers will adore.

// output PhantomJS render of pages if the request is made by a crawler
app.use(phantomjsHtml.middleware.SEO({
  overLocalhost: true
}));

3. delaying the PhantomJS output

In single pages apps, there is often JavaScript executed after the page has loaded to load or display datas. You can tell this module to wait until your app is ready like so :

In the <head> section of your HTML pages :

<script type="text/javascript">
  window.phantomjsHtmlReady = false;
</script>

And when your code is ready, just set phantomjsHtmlReady to true.

window.phantomjsHtmlReady = true;