OSS-Fuzz Public Corpora Crawler

This tool downloads corpora published by OSS-Fuzz.

The code was tested with Python 3.8.16 under Ubuntu 20.04.

Contributions are welcomed :)

Usage

get the code

git clone https://github.com/VoodooChild99/oss-fuzz-crawler.git

install dependencies

pip install -r requirements.txt

pip install requests toml rich

run crawler.py

usage: crawler.py [-h] [-s] -d DIRECTORY [-m MAX_RETRIES] corpuses

OSS-Fuzz Public Corpora Crawler

positional arguments:
  corpuses              The TOML file containing corpuses to download

optional arguments:
  -h, --help            show this help message and exit
  -s, --skip-existed    Download corpuses only when it's not in local
  -d DIRECTORY, --directory DIRECTORY
                        Directory where the corpuses are stored locally
  -m MAX_RETRIES, --max-retries MAX_RETRIES
                        Max retires when downloading corpuses, always retry if not specified

Target Corpora

corpora.toml already covers several OSS-Fuzz projects used by FuzzBench.

You can add more corpuses into corpora.toml as follows:

project = [ "target1", "target2" ]