/zfr-prerender

Integration with prerender.io service for Zend Framework 2

Primary LanguagePHPMIT LicenseMIT

ZfrPrerender

Build Status Scrutinizer Quality Score Coverage Status Latest Stable Version

Are you using Backbone, Angular, EmberJS, etc, but you're unsure about the SEO implications?

This Zend Framework 2 module uses Prerender.io to dynamically render your JavaScript pages in your server using PhantomJS.

Installation

Install the module by typing (or add it to your composer.json file):

$ php composer.phar require zfr/zfr-prerender:3.*

Documentation

How it works

  1. Check to make sure we should show a prerendered page
    1. Check if the request is from a crawler (either agent string or by detecting escaped_fragment query param)
    2. Check to make sure we aren't requesting a resource (js, css, etc...)
    3. (optional) Check to make sure the url is in the whitelist
    4. (optional) Check to make sure the url isn't in the blacklist
  2. Make a GET request to the prerender service (PhantomJS server) for the page's prerendered HTML
  3. Return that HTML to the crawler

Customization

ZfrPrerender comes with sane default, but you can customize the module by copying the config/zfr_prerender.global.php.dist file to your autoload folder (remove the .dist extension), and modify it to suit your needs.

Prerender URL

By default, ZfrPrerender uses the Prerender.io service deployed at http://service.prerender.io. However, you may want to deploy it on your own server. To that extent, you can customize ZfrPrerender to use your server using the following configuration:

return array(
    'zfr_prerender' => array(
        'prerender_url' => 'http://myprerenderservice.com'
    )
);

With this config, here is how ZfrPrerender will proxy the "https://google.com" request:

GET http://myprerenderservice.com/https://google.com

Crawler user-agents

ZfrPrerender decides to pre-render based on the User-Agent string to check if a request comes from a bot or not. By default, those user agents are registered: baidu, facebookexternalhit and twitterbot.

GoogleBot, Yahoo and BingBot are not in the list starting from ZfrPrerender 2.0 as those search engines support the escaped_fragment approach, and we want to ensure people are not penalized for cloacking.

You can add other User-Agent string to evaluate using this sample configuration:

return array(
    'zfr_prerender' => array(
        'crawler_user_agents' => array('yandex', 'msnbot')
    )
);

Note: ZfrPrerender also supports the detection of a crawler through the user of the _escaped_fragment_ query param. You can learn more about this on Google's website.

Ignored extensions

ZfrPrerender is configured by default to ignore all the requests for resources with those extensions: .css, .gif, .jpeg, .jpg, .js, .png, .less, .pdf, .doc, .txt, .zip, .mp3, .rar, .exe, .wmv, .doc, .avi, .ppt, .mpg, .mpeg, .tif, .wav, .mov, .psd, .ai, .xls, .mp4, .m4a, .swf, .dat, .dmg, .iso, .flv, .m4v, .torrent. Those are never pre-rendered.

You can add your own extensions using this sample configuration:

return array(
    'zfr_prerender' => array(
        'ignored_extensions' => array('.less', '.pdf')
    )
);

Whitelist

Whitelist a single url path or multiple url paths. Compares using regex, so be specific when possible. If a whitelist is supplied, only url's containing a whitelist path will be prerendered.

Here is a sample configuration that only pre-render URLs that contains "/users/":

return array(
    'zfr_prerender' => array(
        'whitelist_urls' => array('/users/*')
    )
);

Note: remember to specify URL here and not ZF2 route names. This occur because ZfrPrerender registers a listener that happen very early in the MVC process, before the routing is actually done.

Blacklist

Blacklist a single url path or multiple url paths. Compares using regex, so be specific when possible. If a blacklist is supplied, all url's will be pre-rendered except ones containing a blacklist part. Please note that if the referer is part of the blacklist, it won't be pre-rendered too.

Here is a sample configuration that prerender all URLs excepting the ones that contains "/users/":

return array(
    'zfr_prerender' => array(
        'blacklist_urls' => array('/users/*')
    )
);

Note: remember to specify URL here and not ZF2 route names. This occur because ZfrPrerender registers a listener that happen very early in the MVC process, before the routing is actually done.

Events

ZfrPrerender\Mvc\PrerenderListener triggers two events:

  1. ZfrPrerender\Mvc\PrerenderEvent::EVENT_PRERENDER_PRE: this event is triggered before actually making the request to Prerender service. If you return a Zend\Http\Response object from the listener attached to this event, it will immediately return this response, hence avoiding a new request to the Prerender service.
  2. ZfrPrerender\Mvc\PrerenderEvent::EVENT_PRERENDER_POST: this event is triggered once the response from the Prerender service is made. This allows you to cache it (for instance in Memcached).

Listeners attached to those two events receive an instance of ZfrPrerender\Mvc\PrerenderEvent. Here is an example that shows you how to register listeners using the shared event manager. In your Module.php class:

use ZfrPrerender\Mvc\PrerenderEvent;

public function onBootstrap(MvcEvent $event)
{
    $eventManager  = $event->getTarget()->getEventManager();
    $sharedManager = $eventManager->getSharedManager();

    $sharedManager->attach(
        'ZfrPrerender\Mvc\PrerenderListener',
        PrerenderEvent::EVENT_PRERENDER_PRE,
        array($this, 'prerenderPre')
    );

    $sharedManager->attach(
        'ZfrPrerender\Mvc\PrerenderListener',
        PrerenderEvent::EVENT_PRERENDER_POST,
        array($this, 'prerenderPost')
    );
}

public function prerenderPre(PrerenderEvent $event)
{
    $request = $event->getRequest();

    // Check from your cache if you have already the content
    // $content = ...

    $response = new Response();
    $response->setStatusCode(200);
    $response->setContent($content);

    return $response;
}

public function prerenderPost(PrerenderEvent $event)
{
    // This is the response we get from the Prerender service
    $response = $event->getResponse();

    // You could get the body and put it in cache
    // ...
}

Testing

If you want to make sure your pages are rendering correctly:

  1. Open the Developer Tools in Chrome (Cmd + Atl + J)
  2. Click the Settings gear in the bottom right corner.
  3. Click "Overrides" on the left side of the settings panel.
  4. Check the "User Agent" checkbox.
  5. Choose "Other..." from the User Agent dropdown.
  6. Type googlebot into the input box.
  7. Refresh the page (make sure to keep the developer tools open).