/SBOM-2023

Experimental Data about Java SBOMs https://arxiv.org/pdf/2303.11102.pdf

Primary LanguageJupyter NotebookMIT LicenseMIT

Code and data for Challenges of Producing Software Bill Of Materials for Java

Overview

This repository contains the code and data produced for the paper Challenges of Producing Software Bill Of Materials for Java (IEEE Security & Privacy, 2023).

@article{sbomchallenges,
 title = {Challenges of Producing Software Bill Of Materials for Java},
 journal = {IEEE Security \& Privacy},
 year = {2023},
 doi = {10.1109/MSEC.2023.3302956},
 author = {Musard Balliu and Benoit Baudry and Sofia Bobadilla and Mathias Ekstedt and Martin Monperrus and Javier Ron and Aman Sharma and Gabriel Skoglund and César Soto-Valero and Martin Wittlinger},
 url = {http://arxiv.org/pdf/2303.11102},
}

The structure of the repository is as follows:

  • sbom-production contains all scripts used for creating CycloneDX SBOM files for each of the 26 study subjects using 6 different SBOM producers.
  • ground-truth-production contains all scripts used for extracting a ground truth dataset of dependency trees for each study subject.
  • metrics-computation contains all code used for computing metrics relating to the performance of the SBOM tools.
  • results-march-2023 contains all experimental data.
  • sbom2023_plot contains additional code and resources related to the creation of figures for the paper.

SBOM Producers

The performance of the following 6 CycloneDX SBOM producers were studied:

These are the latest versions as of Fri 5 May 2023 13:02:33 CEST.

Producer Version
Build Info Go 1.9.3
CycloneDX Generator 8.4.3
CycloneDX Maven Plugin 2.7.8
jbom 1.2.1
OpenRewrite 4.45.0
Depscan 4.1.2

Study Subjects

The following versions of 26 Java projects using Maven were selected as study subjects:

# GitHub Repository Commit Hash Stable release as of 01.01.23
1 jenkins ce7e5d7 2.384
2 mybatis-3 c195f12 3.5.11
3 flink c41c8e5 1.15.3
4 checkstyle 233c91b 10.6.0
5 CoreNLP f7782ff 4.5.1
6 neo4j c082e80 5.3.0
7 async-http-client 7a370af 2.12.3
8 error-prone 27de40b 2.17.0
9 alluxio d5919d8 2.9.0
10 javaparser 1ae25f3 3.15.15
11 undertow f52b70c 2.3.2.Final
12 webcam-capture e19125c 0.3.12
13 handlebars.java 2afc50f 4.2.1
14 jooby f71b551 3.0.0.M1
15 tika 41319f3 2.6.0
16 orika eef8209 1.5.4
17 spoon ee73f43 10.2.0
18 accumulo 706612f 2.1.0
19 couchdb-lucene 8554737 2.1.0
20 jHiccup a440bda 2.0.10
21 vulnerability-assessment-tool 3d261af 3.2.5
22 para 41d9005 1.47.2
23 launch4j-maven-plugin 3f9818e 2.2.0
24 jacop 1a395e6 4.9.0
25 selenese-runner-java 3e84e8e 4.2.0
26 commons-configuration 59e5152 2.8.0

Reproduction

If you are interested in reproducing our results, the script reproduce.sh is provided for your convenience. This script will do the following:

  • Generate SBOMs for each study subject and SBOM producer.
  • Extract ground truth dependency information from each study subject.
  • Calculate the accuracy/precision for each SBOM producer and compare these values with our results, outputting whether the values match or not.

⚠️ Please note that this script can take a considerable amount of time (~2 hours on a laptop) since SBOM production needs to be carried out by 6 different producers on 26 different study subjects.

The following software is required for reproduction:

  • Java version 17 or newer
  • Apache Maven
  • Docker
  • Python 3.10 or newer