/puppeteer-prerender

Web page prerenderer service using Puppeteer(Chrome headless node API).

Primary LanguageJavaScriptMIT LicenseMIT

Prerender Puppeteer Service

This is a web page prerenderer service using Puppeteer(Chrome headless node API).

Useful server side rendering through proxy. Outputs HTML, PDF and screenshots as PNG.

.
├── README.MD                   <-- This instructions file
├── Dockerfile                  <-- Instruction for adding puppeteer in docker
├── src                         <-- Source code for a prerender service
│   └── lambda                  <-- Cloudfront functions
│     └── addRenderHeaders.js   <-- Viewer Request function to add header if request if from crawlers
│     └── RouteToRendered.js    <-- Origin Request function to add custom origin to prerender service if the headers is present
│     └── template.js           <-- SAM template for lambda
│   └── core                    <-- Contains core classes for rendering
│     └── Renderer.js           <-- Puppeteer renderer code
    └── routes                  <-- Contains express routes
│     └── health.js             <-- Health check route - prerender-service.io/_health
│     └── renderer.js           <-- Renderer route
│   └── cache.js                <-- Renderer in-memory cache
│   └── index.js                <-- Express app entry point 
│   └── package.json            <-- NodeJS dependencies and scripts

Getting Started

Install dependencies

npm install

Build Docker Image

docker build -t puppeteer-chrome-linux .

Start Docker Image

docker run -i --rm --cap-add=SYS_ADMIN --name puppeteer-chrome -p 8080:3000 puppeteer-chrome-linux

Deploying to Elasticbeanstalk

eb deploy

API

Name Required Value Description Usage
url yes Target URL http://puppeteer-renderer?url=http://www.google.com
type pdf or screenshot Rendering another type. http://puppeteer-renderer?url=http://www.google.com&type=pdf
(Extra options) Extra options (see puppeteer API doc) http://puppeteer-renderer?url=http://www.google.com&type=pdf&scale=2

Testing

curl http://puppeteer-renderer?url=http://www.google.com

Testing with lambda integration as Crawler

curl -A "Googlebot/2.1 (+http://www.google.com/bot.html)" https://example.com/property/condominium-sale-89-769134-+i+fJtDLo

PDF File Name Convention

Generated PDFs are returned with a Content-disposition header requesting the browser to download the file instead of showing it. The file name is generated from the URL rendered:

URL Filename
https://www.example.com/ www.example.com.pdf
https://www.example.com:80/ www.example.com.pdf
https://www.example.com/resource resource.pdf
https://www.example.com/resource.extension resource.pdf
https://www.example.com/path/ path.pdf
https://www.example.com/path/to/ pathto.pdf
https://www.example.com/path/to/resource resource.pdf
https://www.example.com/path/to/resource.ext resource.pdf

Links:

How to Get Server Side Rendering Benefits (Unfurling, Indexing, Search-ability) without Building SSR Logic

Headless Chrome: an answer to server-side rendering JS sites

Installing Puppeteer in Docker

Puppeteer Repo & Docs

Puppeteer on Lambda

Base Docker Image for Renderer 1 Base Docker Image for Renderer 2 Base Code

Try Puppeteer Cloudfront Generate Full Path

License

MIT

Copyright (c) 2019-present, Jessie Cris Vicerra