/Octomender

Get repo recommendation based on your GitHub star history. (EoS)

Primary LanguagePythonMIT LicenseMIT

Octomender

Github Repo Recommender System.

Octomender = Octocat + Recommender

Get repo recommendation based on your GitHub star history.

The recommendation algorithm is deployed and being tested on octomend.com.

Visit octomend.com to help improve the recommendation.

End of Service since GitHub published "Discover Repositories" service.

Dependencies

  • redis: An in-memory database that persists on disk

Core

  • hireids: Minimalistic C client for Redis >= 1.2
  • OpenMP>=4.0: C/C++ API that supports multi-platform shared memory multiprocessing programming

Preprocessing

Website

  • Flask: A microframework for Python based on Werkzeug, Jinja 2 and good intentions
  • GitHub-Flask: Flask extension for authenticating users with GitHub and making requests to the API
  • gunicorn: A Python WSGI HTTP Server for UNIX
  • google-cloud-datestore: Low-level Java and Python client libraries for Google Cloud Datastore

Dataset

Github Archive

Build Core

cd core; make

Preprocessing

Parse raw json data files into three pickle data files.

  • output-data-basename.user: map of user id (str) to user name (str)
  • output-data-basename.repo: map of repo id (int) to repo name (str)
  • output-data-basename.edge: list of tuples of user-repo edge (str, int)
Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename>
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05

Refer raw json data format to GitHub API v3.

Ditto, but run with multiprocessing. Default number of processes is 16.

Usage: parse_mp.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename> [n-process]
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
  n-process         number of processes when multiprocessing.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05 32

Merge multiple pickle data files into one.

Usage: mergedata.py <input-data-dir> <output-data-basename>
Ex:    mergedata.py data/2016-010203/ data/2016-Q1

Insert graph data into redis database.

Usage: graph2redis.py <input-edgelist> <redis-port>
Ex:    graph2redis.py data/2016-Q1.edge 6379

Thanks

importpython and reddit.

importpython

reddit

License

MIT