/gitdoorcheck

Static code analyser for backdoors and malicious code in git repos using OpenAI compatible LLM APIs

Primary LanguagePythonApache License 2.0Apache-2.0

gitdoorcheck - Static code analyser for git repos using OpenAI compatible LLMs

image

  • I find myself pulling way too many random git repos, this is just a simple means of doing some static analysis.
  • Uses GPT-4 (or another openAI compatible API endpoint) to check for any backdoors, malicious code and so on within compilable or executable code.
  • Outputs as a json file
  • Can be used in CI pipeline and fail build if threshold is exceeded
  • Uses list of programming languages extensions from: https://github.com/aymen-mouelhi

Hive mind

  • The day this was published there was also an article on arstechnica about git repos containing backdoor code and mimicking common libraries (typosquatting etc.)

Setup

  • Place your openAI API key in the .env file (if required)

Usage

usage: gitdoorcheck.py [-h] --repo-url REPO_URL --local-repo-path LOCAL_REPO_PATH [--output OUTPUT]

Example usage

python3 gitdoorcheck.py --repo-url="https://github.com/Inplex-sys/BlackCap-Grabber-NoDualHook" --local-repo-path ./blackcap-grabber
{
  "project_name": "BlackCap-Grabber-NoDualHook",
  "project_url": "https://github.com/Inplex-sys/BlackCap-Grabber-NoDualHook",
  "analysis_date": "2024-03-01T05:11:35.762317",
  "total_files_analyzed": 8,
  "total_threats_detected": 18,
  "detected_threats": [
    {
      "type": "Credential Theft",
      "file_name": "builder.py",
      "line_number": "23",
      "code_snippet": "self.webhook = input(...)",
      "confidence": "100",
      "severity": "High",
      "description": "Collects a webhook URL from the user, which could be used to exfiltrate data."
    },
...
    {
      "type": "Credential Theft",
      "file_name": "inject.js",
      "line_number": "N/A",
      "code_snippet": "var tokenScript = ...getToken()",
      "confidence": "100",
      "severity": "Critical",
      "description": "Extracts Discord tokens, potentially leading to account compromise."
    },
    {
      "type": "Data Exfiltration",
      "file_name": "inject.js",
      "line_number": "N/A",
      "code_snippet": "const post = async (params) => {...}",
      "confidence": "100",
      "severity": "Critical",
      "description": "Sends collected data to an external server, indicating data exfiltration."
    },
    {
      "type": "Malicious Code",
      "file_name": "inject.js",
      "line_number": "N/A",
      "code_snippet": "electron.session.defaultSession.webRequest.onBeforeRequest(...)",
      "confidence": "100",
      "severity": "High",
      "description": "Intercepts and potentially alters web requests, which can be used for malicious purposes."
    },
    {
      "type": "Malicious Code",
      "file_name": "inject.js",
      "line_number": "N/A",
      "code_snippet": "electron.session.defaultSession.webRequest.onHeadersReceived(...)",
      "confidence": "90",
      "severity": "High",
      "description": "Modifies HTTP headers to bypass content security policies."
    }
  ],
  "overall_confidence": "90"
}

Integrating as a Github Action

name: Security Analysis Workflow

on: [push, pull_request]

jobs:
  security-analysis:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@v2

    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.8'

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install gitpython openai dotenv shutil

    - name: Run Security Analysis
      run: |
        python gitdoorcheck.py --repo-url ${{ github.event.repository.html_url }} --local-repo-path ./repo --threshold 50
      env:
        OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

    - name: Save Analysis Report
      if: always()
      uses: actions/upload-artifact@v2
      with:
        name: security-analysis-report
        path: analysis-report.json