/SeoSnap

Server Side Rendering (SSR) for javascript applications

Primary LanguageShellGNU General Public License v3.0GPL-3.0

logo

Setup for the whole seosnap stack including dashboard, cache server and cache warmer used for prerendering and full page caching PWA's.

Installation

  • Pull the repo (note: the pull is recursive)
git clone --recursive git@github.com:experius/SeoSnap.git
  • IMPORTANT Update .env file admin username and password. (These have a value default value)
  • Start, build and stop the container
docker-compose up --build -d && docker-compose down

Usage

Logs directory ./logs

Cache directory ./cache

Run cache warmer

Make sure you have created a website via dashboard http://127.0.0.1:8080/seosnap/website/add/

docker-compose run cachewarmer cache <website id>

Nginx

Check the nginx.conf in the example folder

How it works

diagram

Dashboard

In the dashboard you add the website url along with the website sitemap that you want to make 'SeoSnaps' off.

Crawler

When the crawler is started it connects with the dashboard api. It uses scrapy to crawl the sitemap. The scrapy results are send to the administration/dashboard. Scrapy requests are send to the cache server. In a similar way that you would do a request to rendertron.

Cache Server

The cache server is a simple file caching server. If a file exist with the content of the page it serves the html from the file. If not, it renders the requested url with rendertron and saves the html output in a file. To refresh the cache the cache-warmer uses PUT requests instead of GET. This will force update from the cache file.

Build with

diagram

Usage cache warmer See

Commands

Cache

Handles caching of pages associated to given website

Usage: crawl.py cache [OPTIONS] WEBSITE_IDS

Options:
  --follow_next BOOLEAN  Follows rel-next links if enabled
  --recache BOOLEAN      Recached all pages instead of not yet cached ones
  --use_queue BOOLEAN    Cache urls from the queue instead of the sitemap
  --load BOOLEAN         Whether already loaded urls should be scraped instead
  --help                 Show this message and exit.

Clean

Handles cleaning of the dashboard queue

Usage: crawl.py clean [OPTIONS] WEBSITE_IDS

Options:
  --help  Show this message and exit.

Examples

# Cache the sitemap of website 1
docker-compose run cachewarmer cache 1

# Cache requests in queue for websites 1 and 2
 dc run cachewarmer cache 1,2 use_queue=true

# Clean the queue for websites 1 and 2
docker-compose run cachewarmer clean 1,2