GHPR Raw Data
Raw data for the GHPR dataset.
This repository contains JSON files that represent issues and pull requests included in the GHPR dataset. These files are generated by the GHPR Crawler.
Structure
The naming pattern for the files is as follows:
/repos/OWNER/REPO/issue-N.json
for issues, e.g.,/repos/goharbor/harbor/issue-8319.json
/repos/OWNER/REPO/pull-N.json
for pull requests, e.g.,/repos/goharbor/harbor/pull-8425.json
These JSON objects come from the GitHub REST API, with one change:
Pull request objects found in this repository have an additional linked_issue_numbers
property, which is a list of issue numbers in the same repository linked by the pull request using a GitHub keyword.
Data
This version of GHPR contains 13,247 issues and 13,601 pull requests. The data is collected in October 2020 from CNCF graduated projects, specifically, the following repositories:
Repository | # issues | # PRs |
---|---|---|
containerd/containerd |
332 | 331 |
coredns/coredns |
220 | 207 |
envoyproxy/envoy |
1,171 | 1,181 |
fluent/fluentd |
161 | 161 |
goharbor/harbor |
521 | 466 |
helm/helm |
835 | 808 |
jaegertracing/jaeger |
278 | 390 |
kubernetes/kubernetes |
7,758 | 7,968 |
prometheus/prometheus |
533 | 526 |
rook/rook |
847 | 914 |
theupdateframework/specification |
13 | 13 |
tikv/tikv |
373 | 437 |
vitessio/vitess |
205 | 199 |