/CandidateGithubProfiler

📃 Research In Software Engineering

Primary LanguageTeXMIT LicenseMIT


Git Profiler v1.0.2 (PBR21M1)

Automatic profiler based on GitHub Profile

Features • Usage • Data • Paper • Reproduction info for M2/M3/M4 groups

Features

  • GitHub Graph QL Query which gives on output all the data that could be useful in this research.

Usage

Requirements:

  • R Studio - Data Science tool with integrated development environment for R language.
  • R Language - programming language and free software environment for statistical computing and graphics
  • GHQL - a GraphQL client for R
  • MegaLinter - all-in-one linter solution
Running the R Scripts.

Running:

Launching New Project.

Launching New Project in R Studio

Navigating to directory containing scripts (./src/gitprofiler/r_scripts/).

Navigating to R Scripts directory

Open one of the scripts. You have to modify line 10, which holds the GitHub Token value. You can generate one via Personal Access Token Page.

Generating new Personal GitHub Access Token

After generating one, replace the string token <- "<token>" in order to be able to access GitHub Graph QL.

Inserting Private Token

Console Window when running the Query (v0.1.0).

Running Query v0.1.0

Results can be found in the Environment tab on the right pane.

Query Results v0.1.0
Running the Mega Linter.

Current State

At this moment we are investigating incorporating docker into the project so we could make use of the Mega Linter locally. As of v0.1.0 we tested it through GitHub CI.

Setup & Run

Choose any repository of yours and clone it to your machine using git clone command. Then proceed:

cd <your_project_name>
mkdir .github && cd .github
mkdir workflows && cd workflows
notepad mega-linter.yml

Then paste this code snippet below and save the file.

name: Mega-Linter

on:
  push:
  pull_request:
    branches: [master, main]

jobs:
  cancel_duplicates:
    name: Cancel duplicate jobs
    runs-on: ubuntu-latest
    steps:
      - uses: fkirc/skip-duplicate-actions@master
        with:
          github_token: ${{ secrets.PAT || secrets.GITHUB_TOKEN }}

  build:
    name: Mega-Linter
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Code
        uses: actions/checkout@v2
        with:
          token: ${{ secrets.PAT || secrets.GITHUB_TOKEN }}
          fetch-depth: 0
      - name: Mega-Linter
        id: ml
        uses: nvuillam/mega-linter@v4
        env:
          VALIDATE_ALL_CODEBASE: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      - name: Archive production artifacts
        if: ${{ success() }} || ${{ failure() }}
        uses: actions/upload-artifact@v2
        with:
          name: Mega-Linter reports
          path: |
            report
            mega-linter.log

Lastly, push the new workflow into your Remote GitHub Repository with

git add .
git commit -m "MegaLinter"
git push -f

Now, you can open your project through a web browser and navigate to "Actions" tab. You should see the Mega Linter job.

Mega Linter Job visible through GitHub CI

Here's an example result from Mega Linter.

Mega Linter Results Table
Running the Mega Linter locally.

Requirements

Important Notice: Mega Linter is super-heavy in terms of required storage (40GB+).

As a prerequisite - you have to have Docker installed on your computer.

Windows

First, download the Linux Kernel Update Package. It is necessary for Docker to work on your machine. Then, download the Docker executable installer and install it just like any other application. Restart is mandatory after the installation.

Unix

Depending on the version of your distro, something analogous to this command should do the job:

sudo apt-get install docker-ce docker-ce-cli containerd.io

Running

If you have Docker already installed:

  • clone fresh copy of desired repository which you would like to examine using git clone command.
  • navigate to the repository
  • run this command: npx mega-linter-runner --flavor all -e 'ENABLE=,DOCKERFILE,MARKDOWN,YAML' -e 'SHOW_ELAPSED_TIME=true'

New directory should be created in the repository called reports.

Running Mega Linter Scrape Script.

Requirements

As a prerequisite - you have to have Python installed on your computer. The script has been written with Python 3.9.4.

Running

Navigate to the /src/gitprofiler/py_scripts/ directory. Add your output log file (you can generate the output log by appending > output.txt to the command which redirects the standard output stream into text file) into this directory and then open up console and type in:

python scrape.py -f output.txt

This will generate output.json file (in the same directory) which will contain logs in json format as list where under each index one can find dictionary:

{
  "language": str,
  "linter": str,
  "files": int or str,  # amount of detected files in given language by linter
  "fixed": int,         # amount of fixed errors automatically by linter
  "errors": int         # amount of errors that could not be fixed by linter
},

or

{
  "language": str,
  "files": int,                      # amount of detected files in given language by linter
  "lines": int,                      # amount of detected lines in a given language
  "tokens": int,                     # amount of detected tokens ("chars") in a given language
  "clones": int,
  "duplicate_lines_num": int,
  "duplicate_lines_percent": float,
  "duplicate_tokens_num": int,
  "duplicate_tokens_percent": float
},

Data

All available data can be found in the ./data directory. Most importantly: cleaned_data.csv contains all the information that were used in the machine learning model. It is preformatted and adjusted - ready to use out of the box.

Paper

Research Paper

The file itself can be found in the main directory.

Code

The LaTeX code of the research paper can be found under ./paper. You have to have LaTeX compiler installed (for example.: miktex) in order to recreate .pdf file.

Reproduction

Tl;dr research reproduction instruction:

  • You need to navigate to the script and data file related to reproduction. One is in ./src/gitprofiler/r_scripts/ and is called model_script.r and the second one can be found in ./data/ under the name of model_data_no_labels.csv.
  • Open up the script and set proper path to the datafile.
  • If you want, you can label the data by yourself (use isOk variable).
  • Test the model performance for different parameters