/resc-vcs-scanner

Scanner of Repository Scanner, which is a Tool to detect secrets in source code management systems.

Primary LanguagePythonMIT LicenseMIT

Repository Scanner Version Control System Scanner (RESC-VCS-SCANNER)

Python Celery Pydantic Gitleaks CI OpenSSF Scorecard SonarCloud

Note

This component is part of Repository Scanner - resc

Table of contents

  1. About the component
  2. Getting started
  3. Testing

About the component

The RESC-VCS-Scanner component uses the Gitleaks binary file to scan the source code for secrets.

Getting started

These instructions will help you to get a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Run locally from source

Preview

Prerequisites:

  • RabbitMQ and RESC web service must be up and running locally.
    If you have already deployed RESC through helm in Kubernetes, then rabbitmq and resc webservice are already running for you.
  • Install Gitleaks v8.18.0 on your system.
  • Download the rule config toml file to /tmp/temp_resc_rule.toml location by running below command from a Git Bash terminal.
  • Send some repositories to 'repositories' topics of RabbitMQ server by referring the README of RESC-VCS-SCRAPER component.
curl https://raw.githubusercontent.com/zricethezav/gitleaks/master/config/gitleaks.toml > /tmp/temp_resc_rule.toml

Clone the repository, open the Git Bash terminal from /components/resc-vcs-scanner folder, and run below commands.

1. Create virtual environment:

cd components/resc-vcs-scanner
pip install virtualenv
virtualenv venv
source venv/Scripts/activate

2. Install resc_vcs_scanner package:

pip install -e .

3. Set below environment variables:

 export RESC_RABBITMQ_SERVICE_HOST=127.0.0.1   #  The hostname/IP address of the rabbitmq server
 export RESC_RABBITMQ_SERVICE_PORT_AMQP=30902  #  The amqp port of the rabbitmq server
 export RABBITMQ_DEFAULT_VHOST=resc-rabbitmq   #  The virtual host name of the rabbitmq server
 export RABBITMQ_USERNAME=queue_user    #  The username used to connect to the rabbitmq projects and repositories topics
 export RABBITMQ_PASSWORD="" # The password used to connect to the rabbitmq projects and repositories topics can be found for the value of queues_password field in /deployment/kubernetes/example-values.yaml file
 export RABBITMQ_QUEUE=repositories # The name of the queue from which secret scanner will read repositories
 export RESC_API_NO_AUTH_SERVICE_HOST=127.0.0.1 #  The hostname/IP address where RESC web service is running
 export RESC_API_NO_AUTH_SERVICE_PORT=30900  #  The port number where RESC web service is running
 export VCS_INSTANCES_FILE_PATH="" # The absolute path to vcs_instances_config.json file containing the vcs instances definitions
 export GITHUB_PUBLIC_USERNAME="" # Your GitHub username
 export GITHUB_PUBLIC_TOKEN="" #  Your GitHub personal access token
 export GITLEAKS_PATH="" # The absolute path to gitleaks binary executable

You need to replace the following values with your custom values: RABBITMQ_PASSWORD, VCS_INSTANCES_FILE_PATH, GITHUB_PUBLIC_USERNAME, GITHUB_PUBLIC_TOKEN and GITLEAKS_PATH.

Structure of vcs instances config json

The vcs_instances_config.json file must have the following format: Note: You can add multiple vcs instances.

Preview

Example:

{
  "vcs_instance_1": {
    "name": "GITHUB_PUBLIC",
	"scope": ["kubernetes"], 
    "exceptions": [],
    "provider_type": "GITHUB_PUBLIC",
    "hostname": "github.com",
    "port": "443",
    "scheme": "https",
    "username": "GITHUB_PUBLIC_USERNAME",
    "token": "GITHUB_PUBLIC_TOKEN",
    "organization": ""
  }
}
  • scope: List of GitHub accounts you want to scan. For example, lets'say you want to scan all the repositories for the following GitHub accounts. https://github.com/kubernetes
    https://github.com/docker

    Then you need to add those accounts to scope like: ["kubernetes", "docker"]. All the repositories from those accounts will be scanned.

  • exceptions (optional): If you want to exclude any account from scan, then add it to exceptions. Default is empty exception.

The output messages of collect_projects command has the following format:

{
  "project_key": "kubernetes",
  "vcs_instance_name": "GITHUB_PUBLIC",
}

4. Run the secret scan task:

This task reads the repositories from a RabbitMQ channel called 'repositories', runs scan using Gitleaks and saves the findings' metadata to database.

This can be done via the following command:

celery  -A  vcs_scanner.secret_scanners.celery_worker worker --loglevel=INFO -E -Q repositories --concurrency=1  --prefetch-multiplier=1

Run locally using docker

Preview Run the RESC VCS Scanner docker image locally by running the following commands:
  • Pull the docker image from registry:
docker pull rescabnamro/resc-vcs-scanner:latest
  • Alternatively, build the docker image locally by running:
docker build -t rescabnamro/resc-vcs-scanner:latest .
  • Run the vcs-scanner by using below command:
docker run -v <path to vcs_instances_config.json in your local system>:/tmp/vcs_instances_config.json -e RESC_RABBITMQ_SERVICE_HOST="host.docker.internal" -e RESC_RABBITMQ_SERVICE_PORT_AMQP=30902 -e RABBITMQ_DEFAULT_VHOST=resc-rabbitmq -e RABBITMQ_USERNAME=queue_user -e RABBITMQ_PASSWORD="<the password of queue_user>" -e RABBITMQ_QUEUE="repositories" -e RESC_API_NO_AUTH_SERVICE_HOST="host.docker.internal" -e RESC_API_NO_AUTH_SERVICE_PORT=30900 -e VCS_INSTANCES_FILE_PATH="/tmp/vcs_instances_config.json" -e GITHUB_PUBLIC_USERNAME="<your github username>" -e GITHUB_PUBLIC_TOKEN="<your github personal access token>" -e GITLEAKS_PATH="/vcs_scanner/gitleaks_config/seco-gitleaks-linux-amd64" --name resc-vcs-scanner rescabnamro/resc-vcs-scanner:latest celery  -A vcs_scanner.secret_scanners.celery_worker worker --loglevel=INFO -E -Q repositories --concurrency=1  --prefetch-multiplier=1

To create vcs_instances_config.json file please refer to: Structure of vcs_instances_config.json

Run locally as a CLI tool (Still in development)

Preview

It is also possible to run the component as a CLI tool to scan VCS repositories.

1. Create virtual environment:

cd components/resc-vcs-scanner
pip install virtualenv
virtualenv venv
source venv/bin/activate

2. Install resc_vcs_scanner package:

pip install -e .

3. Run CLI scanner:

The CLI has 3 modes of operation, please make use of the --help argument to see all the options for the modes:

  • Scanning a non-git directory:

    secret_scanner dir --help
    secret_scanner dir --gitleaks-rules-path=<path to gitleaks toml rule> --gitleaks-path=<path to gitleaks binary> --ignored-blocker-path=<path to resc-ignore.dsv file> --dir=<directory to scan>
  • Scanning an already cloned git repository:

    secret_scanner repo local --help
    secret_scanner repo local --gitleaks-rules-path=<path to gitleaks toml rule> --gitleaks-path=<path to gitleaks binary> --ignored-blocker-path=<path to resc-ignore.dsv file> --dir=<directory of repository to scan>
  • Scanning a remote git repository:

    secret_scanner repo remote --help
    secret_scanner repo remote --gitleaks-rules-path=<path to gitleaks toml rule> --gitleaks-path=<path to gitleaks binary> --ignored-blocker-path=<path to resc-ignore.dsv file> --repo-url=<url of repository to scan>

Most CLI arguments can also be provided by setting the corresponding environment variable. Please see the --help options on the arguments that can be provided using environment variables, and the expected environment variable names. These will always be prefixed with RESC_

Example: the argument --gitleaks-path can be provided using the environment variable RESC_GITLEAKS_PATH

Ignoring findings

Preview

It is possible to ignore some blocker findings (e.g. false positive) by providing a resc-ignore.dsv file. The bockers will be downgraded to a warning level and marked as ignored. Such file has the following structure:

# This is a comment
finding_path|finding_rule|finding_line_number|expiration_date
finding_path_2|finding_rule_2|finding_line_number_2
  • finding_path contains the path to the file with the blocking finding.
  • finding_rule contains the name of the blocking rule.
  • finding_line_number contains the line number of the finding.
  • expiration_date is optional, contains the date in ISO 8601 format until which this ignore rule should be considered valid.

For example, if we want to ignore the finding in file /etc/passwd for rule root_value_found on line 1 until April 1st 2024 at 23:59 the following line should be used.

/etc/passwd|root_value_found|1|2024-04-01T23:59:00

To ignore this finding ad vitam aeternam:

/etc/passwd|root_value_found|1

Testing

Run below commands to make sure that the unit tests are running and that the code matches quality standards:

Note: To run these tests you need to install tox. This can be done on Linux and Windows with Git Bash.

pip install tox      # install tox locally

tox -v -e sort       # Run this command to validate the import sorting
tox -v -e lint       # Run this command to lint the code according to this repository's standard
tox -v -e pytest     # Run this command to run the unit tests
tox -v               # Run this command to run all of the above tests