Generic interface for associating package releases with vulnerabilities
oliverchang opened this issue · 8 comments
What's the problem this feature will solve?
PyPI currently does not associate any vulnerability information for projects hosted on it.
It would be a huge security benefit for users of PyPI if there was a way project releases can be associated with vulnerabilities.
Describe the solution you'd like
Implement an interface that can be called by vulnerability sources to POST vulnerability information for a package to PyPI. Requests can be signed in a similar way to how the existing GitHub token scanning integration does it.
The details provided would would be fairly minimal and just include:
- Package name.
- Package versions affected.
- Advisory ID.
- Link to advisory details.
This information can then be displayed in the PyPI UI and returned in the PyPI API. The pip install
command can also make use of this information to notify users of vulnerable packages they’re installing.
This will be built in a generic way that can support other vulnerability sources as well.
Additional context
Google (where I work) is working on an open source vulnerability database for open source packages. In particular we’re working on building a community owned vulnerability database for Python packages. This will be ready soon and we’d like to make this information more readily accessible.
We’d be able to contribute the necessary patches here for this web hook as well.
Thanks for bringing up this issue Oliver. I strongly believe that a more visible security posture will significantly contribute to security awareness for both developers and maintainers.
I'm a developer advocate at Snyk, which works in the open source security space, and have my own history and experience working within the Node.js Foundation's ecosystem security working group so I'm well excited about integrating security into PyPI and pip install
.
As a TL;DR, and with respect to Google's new vulnerability database, I'd like to propose and volunteer and make available Snyk's vulnerability database to the Python foundation, to power the security advisories of both PyPI and pip, and I'm confident we can sponsor any work involved for doing that.
As for the "Why" -
- The vulnerability database is curated and triaged by the team, which also actively employes security research and works within communities, and academic research groups to uncover, sometimes hundreds of vulnerabilities in bulk. Every report, even an established one from NVD with a CVE assigned to it, still gets triaged by the team for potential false positives. By far, I'd say one of Snyk's unique edges is the security intelligence of the database feed itself. I can give dozens of examples from the Node.js ecosystem of vulnerabilities that Snyk found first and even until today aren't being reported on by npm audit for example.
- More than just the database - As an extension to the above, making a database available is one thing, but there's a long term relationship that would benefit the Python community in terms of discussing false positives, duplication and responsible disclosing of security issues that Snyk has a dedicated team for and have been actively doing that within other communities. We're happy to strengthen that relationship and help in triaging issues, and consulting maintainers about security fixes where we can be of help.
We greatly appreciate working within communities and projects to create that security awareness and my DevRel team at Snyk have done some great projects on this, to name a few:
- WebPageTest now shows a security score when you scan a website to track the JS libs and vulnerabilities. We worked fully in the open by contributing a pull request to the open source project repository and contributed Snyk's frontend database snapshot and keeping it up to date.
- Google Chrome's Lighthouse project has a best practice for using libraries without any security vulnerabilities - that too is a database that s contributed and powered by Snyk.
- We've worked with several JS ecosystem projects like cdnjs, yarnpkg, jsdelivr and others to build an integration in which they display a security badge on each package page in the UI (similar to what we'd see interesting with PyPI)
- Lastly, but perhaps one of the most fun projects we're building now is the Advisor which provides developers with a package health score about popularity, maintenance, security, and community, to help them assess an open source package. And of course we support Python! Here's an example: https://snyk.io/advisor/python/django
I'm excited to further discuss this and see a built-in security integration in Python related tooling as discussed here.
@lirantal Thanks for responding!
As a TL;DR, and with respect to Google's new vulnerability database, I'd like to propose and volunteer and make available Snyk's vulnerability database to the Python foundation, to power the security advisories of both PyPI and pip, and I'm confident we can sponsor any work involved for doing that.
This is great to hear. To add some additional clarity, there's three separate things here that I want to tease apart:
- a Python-specific community-maintained package vulnerability database (proposal here)
- the broader open-source vulnerability database that is working towards aggregating multiple public and community vulnerability databases, like the database in #1 (this is https://osv.dev/)
- This feature request, which is for providing the generic interface for a service like #2 to integrate with PyPI
It sounds like Snyk's database could easily be a part of any one of these:
- Snyk could contribute it's database to the community-maintained database
- Snyk's database could be one of the databases that OSV aggregates
- Snyk could directly integrate with PyPI's generic interface
Would love to hear about which point in that pipeline you think would be best for Snyk to be involved!
You've outlined that well Dustin, all are indeed viable options.
Snyk would be happy to volunteer its database to the Python community but I think the wording and semantics matter, as in, I don't think there's an actual one-time "contribution", but rather the Snyk team is actively committed to surfacing, traiging and collaborating with maintainers about security vulnerabilities, and so volunteering in terms of creating some sort of integration for the database feed, is more appropriate. That's semantics, and also details, but I thought it's worth mentioning.
To add to that, I think that while the database's quality, quantity and first-to-know properties are super important, the tooling also matters in order to leverage it. So, whatever database integration decided here, making sure that it is integrated natively in Python's developer tooling is going to make the real change we'd like to see in the world, in terms of awareness. I'll reiterate my JS example here with npm install
to install dependencies, resulting with a vulnerability report, which is a good example of that.
I'd be happy to drill down into the details of working with the Snyk vulnerabilities database and we to start poc-ing that as needed.
Would be great to get the conversation going :-)
As part of Red Hat's project Thoth we use PyPA's advisory-db in a server side/cloud Python resolver. A demonstration of this feature is available in this recording. The resolution engine has a pluggable interface (see also adviser repo) so we can eventually plug what @lirantal and Snyk have - let's see how we can combine efforts in this area. We are releasing the resolver/recommendation engine publicly.
@di could this be considerred as a topic for the planned SIG efforts (based on the call we had earlier)?
@fridex that looks awesome!
I'd also like to follow up on how we might be able to move forward with getting Snyk (and others) to contribute its database and how that might look like. Out of the options that @di outlined in #9407 (comment), I think these two options would be ideal:
- Contributing vulnerabilities directly to https://github.com/pypa/advisory-db
- A separate export of Snyk's python DB in the same format so that the existing OSV integration can help report Snyk vulnerabilities without much additional changes.
The above don't necessarily have to be mutually exclusive. For example, I think the ideal scenario would be for Snyk could export vulnerabilities into an independent place, and then we could have a pipeline to auto-merge/deduplicate entries into https://github.com/pypa/advisory-db.
My team at Google is trying to get vulnerability databases to agree on an export/interchange format, so that sharing and aggregating across databases is easy and we reduce potential duplicate triage work. This is the same format that the current python advisory-db uses. See our blog post released today for details!
@lirantal WDYT? Would it make sense to have a call or start an email thread with interested folks on your side to discuss this? I think the collaboration details here could span beyond Python :)
I think this is maybe a pre-requisite for #798, which seems to be mostly about creating a notification mechanism, whereas this issue is about PyPI collecting vulnerability data (which was implemented in #9552) surfacing it in the JSON API (which was implemented in #10197) and surfacing this in the UI (which has not been implemented)