This repository contains tools to sync your GitHub traffic stats to PostHog.
It also includes a set of GitHub Actions to automatically run the sync periodically. You can fork this repository, change the secrets in the GitHub Actions UI, and then forget about it!
To get start with automation, fork this repository and configure any
environment variables as Secrets inside your repository. You can find
this by going to Settings > Security > Secrets and Variables > Secrets.
All secrets are automatically masked in build logs by GitHub, so any key are protected out of the box. The section below will document any options you need to provide to get up and running; only a couple!
Please note you will have to enable the workflow inside your repository; this is due to GitHub disabling builds on forks by default.
There are several options you must provide before running this tool. Options
are provided as environment variables, either manually or via .env. You can
look inside .env.sample for the full list of supported options, along with
their defaults. The required options are as follows:
# Required GitHub options
GPS_GITHUB_KEY=github_pat_* # GitHub read token
# Required PostHog options
GPS_POSTHOG_PHC=phc_* # PostHog project client token
GPS_POSTHOG_PHX=phx_* # PostHog personal authz token
By default this tool wil run against the authenticated user via the provided
token, but you can point it to another user or organization via GPS_GITHUB_ID:
GPS_GITHUB_ID=whitfin
Your GitHub token must have the read-only Administration scope for the user
or organization you're running this tool against. This is required to access
the Traffic APIs (as they're admin-only).
Although deduplication can be handled at the database level, this tool also
handles it at the API level. For this to work properly, make sure your PostHog
API token has project:read and query:read scopes.
The tool is very simple, doing the following:
- Pulls the repositories for the chosen user or organization
- Pulls the last 14 days (the max) traffic stats for each repository
- Maps each statistic into a PostHog event
- Sends each event to PostHog
This means that every day is indexed 14 times (i.e. once a day for two weeks) until it ages out. Fortunately we know that PostHog is based on ClickHouse which will enforce deduplication (last one in) based on a combination of these 4 fields in the event:
uuidnametimestampdistinct_id
The sync will use deterministic fields for these values (via UUID v5) to ensure that they're the same for the same day. ClickHouse provide eventual consistency when it comes to deduplication, so over time any "old" copies of the events will age out. This tool will also deduplicate via API queries, just in case.
This also means that the time of day the sync runs is irrelevant, because the only value for a day which "matters" happens during the sync 14 days after it occured. There is no point running the sync more than once per day as traffic stats appear to be updated at by GitHub on that frequency, and running more often increases the chance of duplicates slipping through the cracks.
If you wish to run this tool manually, you will need to have Elixir installed. Once you have this installed, the easiest way to run it is via Mix:
$ mix deps.get
$ mix syncThe code is pretty simple, but if you have any issues please let me know! At some
point in future I'd like to adopt the posthog-elixir
client, but for the time being it lacks support for the uuid field we require.