Jared Briskman, Serena Chen, Anne Ku, Ian Paul (Olin College)
Under the direction of Jason Woodard (Olin College) and Jonathan Sims (Babson College).
This repo houses two separate, but related research projects:
- Analyzing a list of machine learning projects (CNTK, TensorFlow, Theano, Caffe, Torch7, Deeplearning4j)
- Analyzing the differences between Cloudstack and Openstack
This repository is mostly used for data extraction and manipulation. We get our data from the GitHub API.
Tooling is written in Python 2.7, with a little bit of Mathematica 11.
Tooling is tested stable on Ubuntu 14.04 and 16.04, if running on windows, YMMV.
Licensed under the MIT License.
Dependencies include Requests, an HTTP library for python. Installation is as simple as:
$ pip install requests
Navigate to the desired local location for the repository and clone via: HTTP -
$ git clone https://github.com/IanOlin/github-research.git
SSH -
$ git clone git@github.com:IanOlin/github-research.git
Use of the software requires a keyfile, keyfile.txt
, in the Scraping/
directory.
This should be a plain text document, with at least one github OAuth token in it. More tokens may be added, separated by newlines.
Create a keyfile in an editor of your choice, and generate github oauth tokens with the following instructions: https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/
Skip this step if you already have a dataset.
From the top level, run:
$ python run_scraping.py
to generate a dataset for the ML repositories studied in the research project. Be warned, this may take several hours.
From the top level, run:
$ python run_metrics.py
in order to print all calculated metrics to STDOUT. Redirecting to a text file may be useful, and can be accomplished via:
$ python run_metrics.py > yourpath/yourfile.txt
If one desires to add other repositories for analysis, extend Config/constants.py
, adding another flag for the set of repositories, then running scraping and analysis with that flag.
Scraping and Metrics for Stack repositories can be acquired by passing a command line flag like so:
$ python run_scraping.py Stack
or
$ python run_metrics.py Stack