/dependabot-scraper

Python / Github CLI - Github dependabot alert scraper - Software Composition Analysis (SCA), Vulnerability Management, Patching, Supply Chain Security

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

This respository is no longer maintained. Visit this repository in lieu:

Dependabot Scraper

Dependabot Information scraper for Github

Introduction

The two scripts scrape and parse, respectively, information regarding dependabot alerts for Github repositories belonging to an organization.

Primary data points parsed are open, fixed, dismissed vulnerabilities, and ecosystem (programming language) type of vulnerability.

Prerequisites

  • Bash or ZSH Shell
  • Github CLI
    • To properly read all repos a Github token with security_events scope to read private repositories is required.
  • JQ
  • Python 3 - This was developed and tested with Python 3.10. Likely to work with Python 3.6 and above. (f-strings used in print statements)

Quick Start

Login to Github via gh cli

  1. gh auth login

  2. ./get_all_dependabot.sh <name of organization>
    Eg: ./get_all_dependabot.sh procurify

  3. python3 dependa.py Use dependa2.py instead, better implementation; less use of loops.
    python3 dependa2.py

  4. Output (CSV) files are written to the current folder.

    • JSON files for each repo is saved to ./output folder, in the event manual review is needed. This data can also be viewed via Github, assuming appropriate permissions are granted.

Notes

  1. Jq is unceessary for either the bash or the python script. Jq is used to provide convenient human readable review of the json files, if needed. (Otherwise all the json returns (files) are in a single line.)

  2. Optimization considerations:

    • Query Github via GraphQL
    • Vectorization via NumPy or Pandas (Pandas is built on top of NumPy)

TODO

  1. Remove dependency on gh cli command and almalgamate both scripts to a single Python script. (potentially have this run on as an AWS Lambda and executed via scheduled EventBridge event and forward to a platform such as Slack)
  2. Provide method to name input / output file and folder names via command line paramaeters.
  3. Optimize code (reduce some repetitive code).
  4. Generate graphics with Plotly or alternative graphing module with Python.(?)
  5. Add Docstrings and type hints to the Repo Class, methods, and functions.

References

Github CLI login
List organization repos
List dependabot alerts
Working with Dependabot
Github Dependabot Blog

License

Released under the GPLv3

Contributing

Concerns/Questions, open an issue. Improvements, please submit a pull request.