This repository contains a command-line tool, written in python
, to track developers' contributions to one or more Git repositories within a particular time range. GitHub's Insights tools and charts are not extremely useful, and often omit contributors or give misleading statistics.
This script calculates the following, for each developer in each repository:
- number of merges
- number of commits
- number of lines added
- number of lines deleted
- number of files changed
The results can be formatted as csv
, json
, or markdown
.
- Fork this repository and clone it to your local machine
- Grant yourself execute permissions to the
python
script, e.g.chmod u+x *.py
.
The command ./git-activity.py --help
shows the usage instructions:
usage: git-analysis.py [-h] (-r REPOSITORY | -rf REPOFILE) [-u USER] [-s START] [-e END] [-x EXCLUSIONS] [-f {csv,json,markdown}] [-v]
optional arguments:
-h, --help show this help message and exit
-r REPOSITORY, --repository REPOSITORY
the public URL of the repository whose logs to parse
-rf REPOFILE, --repofile REPOFILE
the path to simple text file with a list of repository URLs to parse
-u USER, --user USER The git username to report. Default is all contributing users
-s START, --start START
Start date in mm/dd/yyyy format
-e END, --end END End date in mm/dd/yyyy format
-x EXCLUSIONS, --exclusions EXCLUSIONS
A comma-separated string of files to exclude, e.g. --excusions "foo.zip, *.jpg, *.json"
-f {csv,json,markdown}, --format {csv,json,markdown}
The format in which to output the results
-v, --verbose Whether to output debugging info
The -r
and -rf
flags control whether the script looks at a single repository, or a batch of repositories stored in a simple text file.
Output the contributions of all developers to a single repository:
./git-analysis.py -r https://github.com/bloombar/git-developer-contribution-analysis.git
Output the contributions of all developers to a set of repositories stored in a file named repos.txt
(see example file):
./git-analysis.py -rf repos.txt
By default, the statistics of all contributors are calculated. The -u
flag can be used to limit the analysis to just a single contributor by referencing their git username.
Output the contributions of only the contributor named bloombar
to a single repository:
./git-analysis.py -u bloombar -r https://github.com/bloombar/git-developer-contribution-analysis.git
The same, but to a batch of repositories listed in the repos.txt
file:
./git-analysis.py -u bloombar -rf repos.txt
By default, contributions from a year ago until today are analyzed. Use the -s
and -e
flags to specify a different start and end date, respectively.
Output the contributions to a single repository for a specific date range, inclusive.
./git-analysis.py -s 11/15/2021 -e 12/15/2021 -r https://github.com/bloombar/git-developer-contribution-analysis.git
The same, but to a batch of repositories listed in the repos.txt
file:
./git-analysis.py -s 11/15/2021 -e 12/15/2021 -rf repos.txt
Results can be filtered to show only contributors with activity. Use the -c
to file the result.
./git-analysis.py -rf repos.txt -c
The results can be formatted as csv
, json
, or a markdown
table. The default is csv
. Use the -f
flag to control the output format.
./git-analysis.py -s 11/15/2021 -e 12/15/2021 -rf repos.txt -f markdown
Flags can be combined to provide more targeted analysis, e.g. a specific contributor over a specific date range
./git-analysis.py -u bloombar -s 11/15/2021 -e 12/15/2021 -r https://github.com/bloombar/git-developer-contribution-analysis.git -f json
The same, but to a batch of repositories listed in the repos.txt
file:
./git-analysis.py -u bloombar -s 11/15/2021 -e 12/15/2021 -rf repos.txt -f json
If a particular contricutor shows a very large number of additions or deletions, typically on the order of many hundreds or thousands, this could be a sign of poor usage of version control.
Most likely, the contributor has failed to update their version control settings to ignore platform or 3rd party code to (i.e. has not updated their .gitignore
file prior to adding such code), and is therefore tracking additions/deletions of code that is not theirs. The entire contribution for that user during this date range should be ignored in this case until the developer fixes this problem.
The output of the script sometimes lists the same individual contributor under more than one git username... This is most likely due to different username settings for various git and GitHub clients.
If a single developer has multiple usernames that all show the same statistics, then only count those stats once. Otherwise, if a single developer has multiple usernames that show different statistics, then add them together to come up with the total for that developer.