Lightweight library for accessing and querying NVD Vulnerability Feeds at ease.
The nvdlib library allows for easy fetching, comfortable exploration and lightweight querying of NVD Vulnerability Feeds.
It achieves that by providing simplistic database-like interface and custom NVD, object-oriented, model.
The default version can be easily installed by common installation approach:
python3 setup.py install
or via pip
pip3 install .
The default architecture of nvdlib is lightweight and does not require any further set up other than the pip
or python
install.
The architecture of the nvdlib, however, is designed such that additional adapters can be implemented to handle backend in a certain way, allowing for f.e. mongodb database backend (This is currently scheduled for future work), while maintaining user-facing interface.
For more in-depth information about the architecture, software design choices and how it can be extended, take a look at docs/architecture document.
For the demonstration of basic usage, we recommend to check out our tutorial which is provided as a Jupyter Notebook.
-
We will fetch NVD Feeds from NVD database using FeedManager.
FeedManager is a context manager which takes control over asynchronous calls using event loop.
This will store JSON feeds locally for future usage. NOTE: In this tutorial, we won't cover any JSONFeed related operations, as it is assumed that this is not the purpose of nvdlib. However, nvdlib is capable of handling raw JSONFeeds and their metadata in case user needed such level of control.
-
Create collection from those feeds
Creating collection from feeds parses each feed to its Document(get familiar with our model) form and produces a Collection object.
Collection is a user facade which acts as a proxy to set of documents and makes quering and operation on documents much easier. Collections can use different adapters based on user choice.
The default adapter despite being very lightweight, provides limited functionality and shows lower performence.
FEED_NAMES = [2002, 2003, 2004, 2005] # choose whichever feeds you want to fetch
with FeedManager(data_dir=tmp_dir, n_workers=5) as feed_manager:
feeds = feed_manager.fetch_feeds(FEED_NAMES)
collection = feed_manager.collect(feeds) # create collection, optionaly, custom feeds can be specified
# [OPTIONAL] step
collection.set_name('Tutorial') # choose whatever name you want for future identification
NOTE: The model might change in the future by adding new attributes based on user feedback. The changes should not alter the model such that any attributes are removed, however.
It is important to get familiar with the document model. Although it is similar to the NVD JSON Feed schema, there are subtle differences to achieve easier access or some attributes might be left out. The model schema is defined in docs/model.md.
Spend some time exploring the Document model. Despite acting somewhat similar to dict, each attribute should be accessible via 'dot notation', python attribute hints should also help with the task, hence a comfortable access and attribute exploration should be guaranteed.
Example:
# let there be an instance of document
doc: Document
doc.pretty() # pretty print the document
doc.cve.pretty() # each attribute of the Document also has the `pretty` method
# project attributes via `project` method
doc.project({'cve.descriptions.data.value': 1}) # Note that even elements inside array can be accessed!
# project attributes via `project` method and hide the document 'id_'
doc.project({'id_': 0, 'cve.descriptions.data.value': 1}) # Note that even elements inside array can be accessed!
# let there be a collection
collection: Collection
cursor = collection.cursor() # create cursor
doc: Document = cursor.next() # return next document
doc
batch: list = cursor.next_batch() # return next batch of documents
batch
Currently, there are the following query selectors implemented (defined in query_selectors.py module):
selector | operation |
---|---|
match |
perform regex match operation |
search |
perform regex search operation |
in_range |
return whether element value lies within given range |
in_ |
return whether element is contained in an array |
gt |
compare two values using greater than operator |
ge |
compare two values using greater or equal than operator |
lt |
compare two values using lower than operator |
le |
compare two values using lower or equal than operator |
# again, let there be a collection
collection: Collection
- Querying by exact match
Note: This query implicitly uses the
match
selector
usoft_collection: Collection = collection.find({'cve.affects.data.vendor_name': 'microsoft'}) # returns new Collection
usof_colleciton.set_name('Microsoft collecion') # optional step for user comfort
usof_collection
Notice that we actually accessed
vendor_name
attribute of each element indata
array using simple dot notation
# draw sample from the microsoft collection
sample, = usoft_colleciton.sample(1)
sample.pretty()
- Querying by pattern matches
win_collection = collection.find({'cve.affects.data.product_name': search('windows')})
win_collection.set_name('Windows collection') # optional step for user comfort
win_collection
collection.find({'cve.year': match("200[1-3]{1}")}) # regex match
- Querying by range of values
The query above using regex although possible, is not very intuitive. For this purpose, we provide methods in_ and in_range
collection.find({'cve.year': in_range(2001, 2003)})
In this context (as years are always integer values), same query can be expressed by _in selector
collection.find({'cve.year': in_([2001, 2002, 2003])})
- Querying by value comparisons
collection.find({'impact.cvss.base_score': gt(9)})
- Complex queries
# find pre-release cves published in december of any year with cvss score greater than 9
pre_release_december_collection = collection.find({
'published_date.month': 12,
'impact.cvss.base_score': ge(9.0),
'cve.affects.data.versions': le('1.0.0')
})
pre_release_december_collection.set_name('December pre-release')
pre_release_december_collection
# yet again, print sample
sample = pre_release_december_collection.pretty(sample_size=1)
sample.pretty()
- In order to use nvdlib in Jupyter Notebook,
tornado==4.3.0
has to be used in order forasyncio
to run properly. This is not an issue on the side of nvdlib and it is being work on (see jupyter/notebook#3397). As this issue appears specificaly in Jupyter Notebook, andtornado
is an indirect requirement, we suggest to create virtual environment with thistornado
version for this purpose..
Author: Marek Cermak macermak@redhat.com
Collaborators: Michal Srb msrb@redhat.com