/vulnerability-data-tools

Primary LanguagePythonApache License 2.0Apache-2.0

Vulnerability Data Tools

This is a project that's meant to help enrich vulnerability data. The project was created due to the uncertain nature of NVD. The project is currently made up of 3 repositories

  • Vulnerability Data Tools
    • This repo. This is where the primary development and planning will happen
  • Vulnerability Data
    • The vulnerability data repo will hold the primary data files. This data can be transformed into various other data formats such as NVD, cve5, and OSV. No discussions should happen in this repository.
  • NVD Data Overrides
    • The transformed NVD data. Today this data is a standalone dataset, soon this data will be generated from the vulnerability-data repo contents. No discussions should happen in this repository.

Please feel free to browse the current issues and submit new ones with ideas and questions. There is also a public Slack channel that can be used for questions and comments #vulnerability-data-project.

Please see the FAQ below that answers some of the questions about how this project will work now and in the future. The FAQ is currently very specific to NVD, while the project is working to be more broad than just NVD.

Project layout

nvd - Scripts that turn the cve5 data into NVD compatible CPE data. Please see the readme in that directory for more details.

annotation_format_examples - This is where the format to store the data is happening. While NVD is the current output of the project, we need a format to store data that is human editable and contains enough details that other ecosystems could be enriched with additional details. Please see the readme in that directory for more details.

Future efforts

We have a lot of ideas on how to do this better in the future. We envision a data format capable of generating the data currently stored in this repository. The NVD format is very constrained. By capturing the same data but formatting it in a nicer way, it will be possible to output any format needed. NVD, OSV, cve5, and more. Think of this repository as a place to learn what we don't know yet.

Regardless of the data format used, it can be expected that this override data will be generated and available for the forseeable future.

Data repositories overview

NVD Overrides Repository

The nvd-data-overrides repository contains the data for the NVD overrides. This is data meant to enrich the JSON currently being returned by NVD.

The .snapshot directory is meant to capture the original nvd record state for any properties which we are overriding at the time it is overridden so that in future if any of those properties on the upstream record change we can detect that we need to reconcile with our overridden values. For the moment it will only be useful if NVD start adding CPE configuration nodes again.

In the data directory the override files are separated by year. The JSON in these files is meant to be inserted into the JSON from NVD for a given CVE ID. The CVE ID is not recorded in the JSON file, it should be extracted from the filename. Think of this as additional data that can be inserted into the NVD records as returned by the NVD API.

At the moment the focus is on the CPE matching data. Additional data such as vendor severity and CWE would be welcome additions.

Vulnerability Data Repository

The vulnerability-data repository is where the enriched vulnerability data will exist.

The annotation format examples will need to be figured out before this repository can start being populated. The intention is that the data from this repo will be used to fill the nvd-data-overrides repo.

Contributing

If you are looking to contribute to this project and want to open a GitHub pull request ("PR") or issue. Please make sure commits are signed-off with -s or --signoff passed to the git command.

FAQ

Why are you doing this?

This data provided by NVD was used by Grype to match artifacts not covered by other data sources. We refer to this as the "matcher of last resort". As such, we need this data for a properly functioning Grype. Since we need this data, Grype is an open source project, and it would be beneficial to cooperate, creating an open source project seemed like the best option.

Can Anchore actually pull this off?

No, we can't. We need help. Open source is one of the most amazing ways to solve problems the world has ever seen. We know we can't do this alone, please come help. Also tell all your friends.

What happens if NVD goes back to normal?

In the event NVD returns, or some other project takes over the current task of NVD, we expect to continue to maintain this project. Not every vulnerability database supports every ecosystem, so being able to enrich vulnerability data makes sense. But the need to enrich everything would be diminished greatly. This project is meant to be downstream of something like NVD, we will defer to their data when possible.

For example there could be vulnerability data about a binary shipped in a Linux distribution, but if that binary is also downloaded from the project directly, that information may not be tracked anywhere else.

Isn't PURL better than CPE? Why don't you just use PURL

The intent of this repo is to mimic the data NVD provided. There are many tools that expect data in the same format as NVD.

Other data formats, such as OSV, can support PURL. One of our goals is to store metadata in a way that different formats can be the output of the project.

Is this meant to replace CVE?

Not at all. The purpose is to enrich only existing vulnerability identifiers. Every current vulnerability identification project has a constrained scope. This is meant to fill some of the gaps left by those constraints.

Shouldn't this project be part of some larger foundation?

Probably yes. However, the best way to create a successful open source project is to do the work. Finding a long term home for this effort will come once we have proven assumptions and have a functioning process.

How can I help?

You're welcome to submit PRs to this repo as well as the nvd-data-overrides repo. There is also a Slack channel in the Anchore Community Slack called #vulnerability-data-project. Feel free to join and ask questions or share ideas there.