Contribution Complexity

What is this?

This tool computes the complexity of a specified contribution to a git repository. A contribution is one or more commits specified by their commit hashes. Alternatively, if commit messages contain references to issue numbers, a contribution can be specified by a regular expression matching a certain set of commits.

The tool reports a contribution complexity on the scale low, moderate, medium, elevated, high. That value identifies weather a contribution was simple to make (value low) or if it consists of multiple intricate changes (value high) that were difficult to integrate into the system.

For example, the storage engine of Apache Cassandra (DBMS) was refactored for version 3 to better support certain concepts of the query language and to allow for future performance optimizations, see ticket CASSANDRA-8099 The corresponding commit modifies almost 50k lines in 645 files and contains many non-trivial changes. On the other hand a bug that prevented under certain circumstances streaming between cluster nodes was fixed with a quite tiny patch modifying 15 lines in two files.

For humans inspecting the two contributions it is quickly clear that the former contribution is way more complex to implement than the latter.

This tool is meant to automate the process of identification of contributions of various complexities either for inclusion in a CI setup or for research.

Installation

$ pip install contribution-complexity

Running

You can run the tool either by specifying a list of commits or by providing a regular expression that matches commit messages containing

$ contribcompl commits <path_to_repo> <commit_shas>...
$ contribcompl issue <path_to_repo> <issue_regex>...

For example,

$ git clone git@github.com:apache/Cassandra.git /tmp/cassandra
$ contribcompl commits /tmp/cassandra 021df085074b761f2b3539355ecfc4c237a54a76 2f1d6c7254342af98c2919bd74d37b9944c41a6b
ContributionComplexity.LOW
$ contribcompl issue /tmp/cassandra 'CASSANDRA-8099( |$)'
ContributionComplexity.HIGH

Calling from Code

from contribution_complexity.compute import find_commits_for_issue
from contribution_complexity.metrics import compute_contrib_compl


issue_re = "CASSANDRA-8099( |$)"
path_to_repo = "/tmp/cassandra"
commit_shas = find_commits_for_issue(path_to_repo, issue_re)
contribcompl = compute_contrib_compl(path_to_repo, commit_shas)
print(contribcompl)

Citing this work:

See CITATION.bib.

Recreating the Experiment

Requirements

Vagrant with DigitalOcean plugin
A DigitalOcean account
SSH keys registered with DigitalOcean
The SSH key name on an environment variable SSH_KEY_NAME
A DigitalOcean API token on an environment variable DIGITAL_OCEAN_TOKEN

Run!

Set your Github API key in the Vagrantfile, i.e., replace <PUT_YOUR_KEY_HERE> on line 33 with your key.
Run vagrant up in this directory, which will bring up and configure a VM accordingly. It will automatically start the experiment recreation, which will take some hours to run.
Once done you have all results on the VM (log onto the machine with vagrant ssh) in the directory /vagrant/data/

The experiment is described in experiment/run_experiment.sh.

Attribution

The logo is adapted from a [flaticon icon](on https://www.flaticon.com/free-icon/puzzle_808497?term=contribution&page=1&position=16&page=1&position=16&related_id=808497&origin=search). Proper attribution to the original:

Icons made by mynamepong from www.flaticon.com