ghuser.io's database scripts

This repository provides scripts to update the database for the ghuser.io Reframe app. The database consists of JSON files. The production data is stored on AWS. The scripts expect it at ~/data and this can be overridden by setting the GHUSER_DBDIR environment variable.

The fetchBot calls these scripts. It runs daily on an EC2 instance.

Setup
Usage
Implementation
Production JSON files
Contributors

Setup

API keys can be created here.

$ npm install

Usage

Start tracking a user

$ ./addUser.js USER

Stop tracking a user

$ ./rmUser.js USER "you asked us to remove your profile in https://github.com/ghuser-io/ghuser.io/issues/666"

Refresh and clean data for all tracked users

$ export GITHUB_CLIENT_ID=0123456789abcdef0123
$ export GITHUB_CLIENT_SECRET=0123456789abcdef0123456789abcdef01234567
$ export GITHUB_USERNAME=AurelienLourot
$ export GITHUB_PASSWORD=********
$ ./fetchAndCalculateAll.sh
GitHub API key found.
GitHub credentials found.
...
/home/ubuntu/data/users
  2654 users
  largest: gdi2290.json (26 KB)
  total: 5846 KB
/home/ubuntu/data/contribs
  largest: orta.json (144 KB)
  total: 14 MB
/home/ubuntu/data/repos
  112924 repos
  65706 significant repos
  largest: jlord/patchwork.json (712 KB)
  total: 203 MB
/home/ubuntu/data/repoCommits
  largest: CocoaPods/Specs.json (3965 KB)
  total: 397 MB
/home/ubuntu/data/orgs
  11072 orgs
  largest: google-certified-mobile-web-specialists.json (445 B)
  total: 3520 KB
/home/ubuntu/data/nonOrgs.json: 252 KB
/home/ubuntu/data/meta.json: 49 B
total: 623 MB

=> 240 KB/user

real    449m19.774s
user    15m52.644s
sys     2m21.976s

Implementation

Several scripts form a pipeline for updating the database. Here is the data flow:

[ ./addUser.js myUser ]   [ ./rmUser.js myUser ]
                 │             │
                 v             v
              ┌───────────────────┐
              │ users/myuser.json │<───────────┐
              └────────────────┬──┘ │─┐        │
                └──────────────│────┘ │        │                    ╔════════╗
                  └────┬───────│──────┘        │                    ║ GitHub ║
                       │       │               │                    ╚════╤═══╝
                       │       v               │                         │
                       │   [ ./fetchUserDetailsAndContribs.js myUser ]<──┤
                       │                                                 │
                       ├────────────>[ ./fetchOrgs.js ]<─────────────────┤
                       │                   ^     ^                       │
                       │                   │     │                       │
                       │                   v     v                       │
                       │      ┌──────────────┐ ┌─────────────────┐       │
                       │      │ nonOrgs.json │ │ orgs/myOrg.json │─┐     │
                       │      └──────────────┘ └─────────────────┘ │─┐   │
                       │                         └─────────────────┘ │   │
                       │                           └──────────┬──────┘   │
                       │                                      │          │
                       ├──>[ ./fetchRepos.js ]<──────────────────────────┘
                       │             ^                        │
                       │             │                        │
                       │             v                        │
                       │  ┌───────────────────────────┐       │
                       │  │ repo*/myOwner/myRepo.json │─┐     │
                       │  └───────────────────────────┘ │─┐   │
                       │    └───────────────────────────┘ │   │
                       │      └────┬──────────────────────┘   │
                       │           │                          │
                       │           │          ┌───────────────┘
                       │           │          │
                       v           v          v
                   [ ./calculateContribsAndMeta.js ]
                           │               │
                           v               v
       ┌──────────────────────┐         ┌───────────┐
       │ contribs/myuser.json │─┐       │ meta.json │
       └──────────────────────┘ │─┐     └───────────┘
         └──────────────────────┘ │
           └──────────────────────┘

NOTES:

These scripts also delete unreferenced data.

Instead of calling each of these scripts directly, you can call ./fetchAndCalculateAll.sh which will orchestrate them.

Production JSON files

The production JSON files are currently stored on S3 and exposed to front end over HTTPS, e.g.

A daily backup named YYYY-MM-DD.tar.gz containing all the JSON files is also available, e.g. 2018-10-07.tar.gz.

Contributors

Thanks goes to these wonderful people (emoji key):

_{Aurelien Lourot} 💬 💻 📖 👀	_Charles 💻 📖 🤔	_{Romuald Brillout} 🤔

This project follows the all-contributors specification. Contributions of any kind welcome!

Github-Web-Apps/GH-User.io-DB