spdx/spdx-3-model

Core: Need class 'Dictionary' for listing multiple key-value pairs

VenkatTechnologist opened this issue · 8 comments

See #769.

For things like hyperparameters, metrics, and metric thresholds, we need a class called 'Dictionary' which can hold a list of 'DictionaryEntry' values.

zvr commented

As you write, we already have DictionaryEntry for exactly this reason and we have implemented "dictionaries" by having multiple instances of this type.

I'm not sure exactly why there is a need for a class holding them. Can you explain (maybe with a simple example)?

zvr commented

If the whole purpose is to simply group the entries together, I don't think there is a need for a "dictionary" class.

For example, when we record a build, it has a number of parameters. Therefore the Build class, has a parameters property (of type DictionaryEntry). Or we want to record all the environment -- again, a single property environment of type DictionaryEntry is enough, since you can have many of them.
Take a look at the definition.

This allows to record a "dictionary" of key/value pairs, without the need of a new class.

Some example data for such an object would be:

{
    "type": "Build",
    "buildId": "my_id",
    "buildType": "https://github.com/actions/deploy",
    "buildStartTime": "2024-05-10T12:34:45Z",
    "buildEndTime": "2024-05-10T12:34:54Z",
    "parameters": [
        { "key": "param1", "value": "val1" },
        { "key": "param2", "value": "val2" },
        { "key": "param3", "value": "val3" }
    ],
    "environment": [
        { "key": "user", "value": "guest" },
        { "key": "cwd", "value": "/home/guest" },
        { "key": "arch", "value": "x86_64" }
    ]
}

Is this clearer now?

zvr commented

OK, let me try and write the alternative explicitly.

So, in the current state where environment is a 0..* DictionaryEntry, it is serialized, as shown in the previous comment, this way:

    "environment": [
        { "key": "user", "value": "guest" },
        { "key": "cwd", "value": "/home/guest" },
        { "key": "arch", "value": "x86_64" }
    ]

If we had defined a Dictionary class, how would we have done it? We want to represent a list of DictionaryEntry entries. In the RDF object model, there are no lists -- we simply allow properties to appear more than once. Therefore the definition of Dictionary would only contain an "entry" property, that could appear multiple times:

# Dictionary

## Properties

- entry:
  - type: DictionaryEntry
  - minCount: 1 (or 0)
  - maxCount: *

If we had this definition, the type of environment would be a 0..1 Dictionary.

But then the actual data would be serialized as:

    "environment": {
	"type": "Dictionary",
	"entry": [
	    { "key": "user", "value": "guest" },
	    { "key": "cwd", "value": "/home/guest" },
	    { "key": "arch", "value": "x86_64" }
	]
    }

As you can see, by defining the Dictionary class we only introduce an extra level in JSON, without any real benefit. That's why we defined only the DictionaryEntry class and not a collection of these.

zvr commented

@VenkatTechnologist can we close this one?

Closed per above comments.