/docker-awstats-cloudfront

A Docker container to process and serve website analytics using CloudFront and AWStats.

Primary LanguageShellMIT LicenseMIT

Docker GoAccess CloudFront

A Docker container which syncs CloudFront logs from an S3 bucket, processes them with GoAccess, and serves them using Nginx.

Docker Image Version Docker Image Size

Usage

Environment Variables

This container uses the s6 overlay, so you can set the PUID, PGID and TZ environment variables to set the appropriate user, group and timezone.

The following environment variables are used in addition the the standard s6 overlay variables.

Variable Description
AWSTATS_HOSTNAME The hostname of the site for which stats should be generated.
BUCKET The S3 bucket to which CloudFront writes it's logs.
AWS_ACCESS_KEY_ID The AWS Access Key Id with read permissions for the log bucket. Alternatively, you can use a config file based credentials in /config/.aws/credentials.
AWS_SECRET_ACCESS_KEY The AWS Secret Access Key paired with the id. Alternatively, you can use a config file based credentials in /config/.aws/credentials.
CRON (Optional) The cron schedule to sync. If missing, the container will perform a one time sync on launch.
AWSTATS_ARGS (Optional) Additional arguments to add to the goaccess command.
HTML_FILENAME (Optional) The name of the html file to generate (without the extension). This can be used to generate analytics reports for multiple sites.
NO_SERVER (Optional) If this variable is set then the nginx server won't be started in this container.
PRUNE (Optional) If set, log files older than the specified number of days will be deleted from S3
POST_ACTION (Optional) Specify one of the possible post execution actions to take place after the sync script completes.
  • script: A shell script to execute, placed in /config.
  • command: A shell command to execute.
HEALTHCHECK_ID (Optional) The ID of a https://healthchecks.io/ check which will be pinged before and after sync and processing for monitoring duration and health of the container.

If you specify a POST_ACTION script, it will receive the generated analytics HMTL file as $1.

Volumes

The container uses three volumes: /logs, /config, and /output. Synced log files will be stored in the volume mounted at /logs. A customizable nginx.conf file will be written to /config, and the resulting analytics report will be written to an html file in /output.

Examples

You can run the container using Docker Compose, or using a standard Docker command.

One-off processing

The following command will sync the log files from my-access-logs into /tmp/goaccess-cloudfront/logs, process them with goaccess, writing the resulting html file to /tmp/goaccess-cloudfront/html/index.html, and exit.

docker run \
  -e "PUID=1000" \
  -e "PGID=998" \
  -e "AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}" \
  -e "AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}" \
  -e "BUCKET=my-access-logs" \
  -e "NO_SERVER=1" \
  -e "PRUNE=60" \
  -e "AWSTATS_HOSTNAME=example.com" \
  -v "/tmp/goaccess-cloudfront/logs:/logs:rw" \
  -v "/tmp/goaccess-cloudfront:/config:rw" \
  -v "/tmp/goaccess-cloudfront/html:/output:rw" \
  rharter/goaccess-cloudfront

With the PRUNE=60 environment variable set, logs older than 60 days will be remove from S3, keeping costs down. This means that the generated analytics will also only reflect the past 60 days.

Docker Compose

By adding the following configuration to a Docker Compose yaml file, the container will continuously run, syncing CloudFront access logs from S3 bucket my-access-logs every 5 minutes and updating the served html report. The generated analytics report can be accessed at http://server.address/index.html.

  analytics:
    container_name: analytics
    image: rharter/goaccess-cloudfront
    restart: unless-stopped
    environment:
      - PUID=${PUID}
      - PGID=${PGID}
      - TZ=${TZ}
      - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
      - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
      - BUCKET=my-access-logs
      - CRON=*/5 * * * *
    ports:
    - "80:80"
    volumes:
      - ${USERDIR}/docker/analytics/logs:/logs:rw
      - ${USERDIR}/docker/analytics:/config:rw
      - ${USERDIR}/docker/analytics/html:/output:rw

Multiple Sites with Docker Compose

To generate analytics for multiple sites that are served by a single service, run multiple instances of the container, but only have one of them serve the resulting files. Make sure that you separate the log directories.

  analytics-foo-com:
    container_name: analytics-foo-com
    image: rharter/goaccess-cloudfront
    restart: unless-stopped
    environment:
      - PUID=${PUID}
      - PGID=${PGID}
      - TZ=${TZ}
      - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
      - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
      - BUCKET=foo.com-access-logs
      - CRON=*/5 * * * *
      - NO_SERVER=1
      - HTML_FILENAME=foo
    volumes:
      - ${USERDIR}/docker/analytics/logs/foo.com:/logs:rw
      - ${USERDIR}/docker/analytics:/config:rw
      - ${USERDIR}/docker/analytics/html/foo.com:/output:rw
      
  analytics-main:
    container_name: analytics-main
    image: rharter/goaccess-cloudfront
    restart: unless-stopped
    environment:
      - PUID=${PUID}
      - PGID=${PGID}
      - TZ=${TZ}
      - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
      - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
      - BUCKET=ryanharter.com-access-logs
      - CRON=*/5 * * * *
      - HTML_FILENAME=ryanharter
    ports:
    - "80:80"
    volumes:
      - ${USERDIR}/docker/analytics/logs/ryanharter.com:/logs:rw
      - ${USERDIR}/docker/analytics:/config:rw
      - ${USERDIR}/docker/analytics/html/ryanharter:/output:rw

Analytics for foo.com will be available on the host at http://server.address/foo.html, and analytics for ryanharter.com will be available at http://server.address/ryanharter.html. By placing a custom file at ${USERDIR}/docker/analytics/index.html, you can have a landing page that directs users to your other analytics reports.

License

MIT. See LICENSE.txt

Copyright 2020 Ryan Harter