virtualenv -p python3 venv
source venv/bin/activate
In order for us to create a model for productivity we need to gather the following information for each aspect of SPACE.
- Performance: Number of bugs (number of issues)
- Activity: Number of pull requests (open and close)
- Communication: Number of pull requests
As a baseline we will use the number of commits to measure productivity:
- We can easily get this with pydriller.
The page limit for the progress report.
We aim to develop a linear model to predict productivity. This model has five preditctors (at this stage). These predictors are listed as follows:
- Number of bugs reported
- Number of pull requests opened per month
- Number of pull requests closed per month
- Number of merged pull requests
- Number of unmerged pull requests
These predictors will predict an outcome which is the number of commits.
- Agnieszka and Mahyar will be responsible for extracting information.
- Agnieszka will gather information about pull requests
- Mahyar will gather information about issues (bugs).
- Deliverables: python module that does these
- Alicia and Veronica will work on developing a model:
- Developing a linear model using data analysis modules in python
- Create dummy data for your model and develop that model
- Deliverable: Jupyter Notebook to create model and plot graphs
Never push your code to master branch, rather create a branch with your name (just a suggestion) and then create a PR.
For each project we gather the follwoing data:
- developer_id
- time_window
- issue_opened
- issue_closed
- pr_opened
- pr_closed
- pr_merged
- commits
- Statistics for the selected GitHub projects. (Language, issues, PRs, commits)
- Statistics on the metrics related to productivity over a period of 12 time windows: mean, st. dev, min, median, max for: issue_opened, issue_closed, pr_opened, pr_closed, pr_merged, commits.
- Multilinear regression summary table for all repos
- Maybe include linear regression summaries and plots (?)
- Results