klangner/matrobot

Measuring pull requests

Closed this issue · 4 comments

It would be very interesting to understand the number of pull requests over time, and compare this to pushevents. It gives an indication of the contributions from the periphery of the community (i.e. people who dont have commit access or those who feel unsure about their code and wants others to review it)

I have 2 projects:

  1. https://github.com/klangner/github-analysis
  2. and this one

I have all data on my disk and use the first program to create CSV files to import data to matrobot.com, use Weka (http://www.cs.waikato.ac.nz/~ml/weka/) or test on my own classifiers (also in github-analysis project).

I'm writing about it since it is easier for me to provide CSV file then to create it on matrobot.com. If you want to test any hypothesis maybe design input file (as CSV) and I'll check if I can create this file for you.

As a example:

The question I want to try is:
Does it matter how long the pull waits to be merged? Do outside committers are willing to spend more time if their pull request are merge faster?
I'll need to prepare data with average time it takes to merge pull for each month and compare it to next month activity.

What is your question and what data do you need to check it? Maybe I can help with that.
Do you want to compare pull requests to the push events the same month or next month?

BTW:
There is no correlation between increasing number of committers and increasing activity. So even if there are more committers in October then was September it doesn't mean that activity in November will be bigger then in October.

Thanks for your comprehensive overview. Currently I am interested in doing a detailed case study of the rubinius/rubinius project. Basically I am interested in the following data, for a continuous time period that is as long as possible (ideally all of 2011 and 2012, but if that doesn't work, the longer the better). All data would be per month.

  • No. of push events
  • No. of pull requests
  • No. of active contributors
  • No. of forks

This would not necessarily be per month, but the raw event stream with timestamps:

If you can help me to get any of this data, I would be very grateful!

Ok. Lets do it this way:

I have created issue in github-analysis project: klangner/github-analysis#1
I'll add functionality to create csv there.
Since I already have all data (17GB) I will also create csv file and send it to you. If you need more fields just update issue there or create new one.

I'll also close this issue as I understand when you get your data as csv, then this functionality won't be necessary in matrobot.com.

Thanks, much appreciated!