/ghpr-dataset-raw

Raw data for the GHPR dataset

Primary LanguageShellCreative Commons Attribution 4.0 InternationalCC-BY-4.0

GHPR Raw Data

Raw data for the GHPR dataset.

This repository contains JSON files that represent issues and pull requests included in the GHPR dataset. These files are generated by the GHPR Crawler.

Structure

The naming pattern for the files is as follows:

These JSON objects come from the GitHub REST API, with one change: Pull request objects found in this repository have an additional linked_issue_numbers property, which is a list of issue numbers in the same repository linked by the pull request using a GitHub keyword.

Data

This version of GHPR contains 13,247 issues and 13,601 pull requests. The data is collected in October 2020 from CNCF graduated projects, specifically, the following repositories:

Repository # issues # PRs
containerd/containerd 332 331
coredns/coredns 220 207
envoyproxy/envoy 1,171 1,181
fluent/fluentd 161 161
goharbor/harbor 521 466
helm/helm 835 808
jaegertracing/jaeger 278 390
kubernetes/kubernetes 7,758 7,968
prometheus/prometheus 533 526
rook/rook 847 914
theupdateframework/specification 13 13
tikv/tikv 373 437
vitessio/vitess 205 199