anchore/syft

"bom-ref" field shows escaped characters for special chars

Closed this issue · 2 comments

We are running syft to generate sbom files with cyclonedx-json@1.5 output option.
We are seeing escaped characters in some sections of the SBOM structure like "bom-ref" and others.
When the SBOM.json is created, the Contents and its bom-ref information shows escaped characters.

Is this the expected behavior? I tried to search for open issues, tried to update my syft version, check charset, etc.
I was hoping to see the correct module/package name.

See how it looks like when the SBOM is created. Check keys bom-ref, cpe and purl shows escaped and different "patterns" to generate the output, since the name shows it properly.

"components": [
    {
      "bom-ref": "pkg:npm/%40aashutoshrathi/word-wrap@1.2.6?package-id=cf52c618c3862994",
      "type": "library",
      "name": "@aashutoshrathi/word-wrap",
      "version": "1.2.6",
      "cpe": "cpe:2.3:a:\\@aashutoshrathi\\/word-wrap:\\@aashutoshrathi\\/word-wrap:1.2.6:*:*:*:*:*:*:*",
      "purl": "pkg:npm/%40aashutoshrathi/word-wrap@1.2.6",

Steps to reproduce the issue:

Consider this package-lock.json, noticed the module_name in the section:

"node_modules/@aashutoshrathi/word-wrap": {
      "version": "1.2.6",
      "resolved": "https://<REMOVED>/@aashutoshrathi/word-wrap/-/word-wrap-1.2.6.tgz",
      "integrity": "sha512-1Yjs2SvM8TflER/OD3cOjhWWOZb58A2t7wpE2S9XfBYTiIl+XFhQG2bjy4Pu1I+EAlCNUzRDYDdFwFYUKvXcIA==",
      "dev": true,
      "engines": {
        "node": ">=0.10.0"
      }
    },

Command:

cd <repository_folder>
./syft . -o cyclonedx-json@1.5=testsbom2.json

Output:

{
  "$schema": "http://cyclonedx.org/schema/bom-1.5.schema.json",
  "bomFormat": "CycloneDX",
  "specVersion": "1.5",
  "serialNumber": "urn:uuid:4894d768-3c79-4d7d-b06d-f8dfdc615a4b",
  "version": 1,
  "metadata": {
    "timestamp": "2024-10-24T20:04:40Z",
    "tools": {
      "components": [
        {
          "type": "application",
          "author": "anchore",
          "name": "syft",
          "version": "1.14.2"
        }
      ]
    },
    "component": {
      "bom-ref": "5054cfdceff1cb35",
      "type": "file",
      "name": "<repository_folder>"
    }
  },
  "components": [
    {
      "bom-ref": "pkg:npm/%40aashutoshrathi/word-wrap@1.2.6?package-id=cf52c618c3862994",
      "type": "library",
      "name": "@aashutoshrathi/word-wrap",
      "version": "1.2.6",
      "cpe": "cpe:2.3:a:\\@aashutoshrathi\\/word-wrap:\\@aashutoshrathi\\/word-wrap:1.2.6:*:*:*:*:*:*:*",
      "purl": "pkg:npm/%40aashutoshrathi/word-wrap@1.2.6",
      "properties": [

Environment:

syft --version
syft 1.14.2

cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Hi @dszortyka, it looks like the character @ is being replaced by %40 in the PURL, e.g. a package with this name: @aashutoshrathi/word-wrap results in a PURL like this: pkg:npm/%40aashutoshrathi/word-wrap@1.2.6 and a similar encoding is happening in the bom-ref, which uses the PURL. The PackageURL spec is pretty clear about @ being a special character which requires encoding as %40 when used in locations other than the version separator, explicitly stating: "the '@' version separator must be encoded as %40 elsewhere". I believe Syft is doing the correct thing here, but do you think there is something else Syft should do?

Hi @kzantow ,
Thank you. I didn't see the PackageURL spec link you provided.
Everything looks good and clear to me. I appreciate your time answering this thread.