Artifact package for our paper "How do Developers Talk about GHA?". This repository includes our data and scripts.
- Data collection (2018.10.1-2022.10.31)
- SO data, i.e., posts, from the official SO data dump
- GitHub data, i.e., issues, using the GitHub Search API
- Data for manual classification:
SO post
andGitHub issue
- This data includes:
- 6,590 SO questions (Q_S) with 2,471 accepted SO answers (A_S)
- 315 GitHub issues (Q_G) with 217 closed GitHub issues (A_G)
- The results of manual classification can be found in
all_post_issue_category.csv
- Data structure: (id, type, phase, category)
- id: the number used in this paper. "P1" and "I1" represent the first SO post and the first GitHub issue in our dataset, respectively.
- type: "github issue" or "so post"
- phase: phase of a post or an issue
- category: category of a post or an issue
- This data includes:
- Data for characteristics analysis
- The data for characteristics analysis can be found in
so_post_popularity.csv
andso_post_difficulty.csv
- Popularity metrics include:
- avgView, the average number of views for all the questions of a category;
- avgFav, the average number of favorites for all the questions of a category;
- avgScore, the average score for all the questions of a categpru;
- avgAns, the average number of answers for all the questions of a category.
- Difficulty metrics include:
- ansRate, the percentage of questions of a category with at least one answer;
- acceptRate, the percentage of questions of a category that have accepted answers;
- timeFA, the median time needed for questions of a category to receive the first answers, in hours;
- timeAA, the median time needed for questions of a category to receive the accepted answers, in hours;
- textSize, the average number of description characters for questions of a category.
- The data for characteristics analysis can be found in
- The accepted answer examples and detailed discussion of each solution strategy can be found in
solution_strategies.md
We seek to analyze the characteristics of the identified problem categories in terms of popularity and difficulty.
- Spearman's rank correlation coefficient
cor.R
- Figure 1: The trend of GHA discussed on Stack Overflow
- Figure 2: The taxonomy of GHA problems
- Table 1: Popularity of GHA problem categories
- Table 2: Difficulty of GHA problem categories
- Table 3: Correlation between Popularity and Difficulty of GHA problem categories
- Table 4: Difficulty of GHA problem categories (GitHub issues)