`hasDataFile` would benefit from a better description
Closed this issue · 10 comments
The following line does not help to understand what it means and how it should be used. It is missing context and data file is a very generic term.
Looks like it was just copied from the SPDX 2.X definitions.
@maxhbr If you want to create a PR, we can get this into 3.0, otherwise I'll target it for 3.1
If someone can explain to me, what a "data file" is, sure. But then it might even be easier to create the PR instead of explaining it to me.
It has been a very long time since this was discussed.
I'll move it to 3.1 unless someone wants to volunteer to write a PR.
@rgopikrishnan91 - do you want to take a pass at this description?
In this example SBOM, I use it to documented that a classifier script has a classifier model (data file).
[ predict.py ]
--hasDataFile-->
[ model.bin ]
{
"type": "Relationship"
"relationshipType": "hasDataFile",
"from": "https://spdx.org/spdxdocs/File11",
"to": [
"https://spdx.org/spdxdocs/File10"
],
}
I left a review on the PR, but here are my thoughts
Generic database fie --> File
I am unsure if saying hasDataFile is the right name? Should it hasAsset? because hasDataFile for a model executable and log file seems odd and non-intitutive.
Also, I am unsure and maybe we should clarify how its different from dependsOn relationship?
Should it hasAsset?
In other parts of the SPDX spec, we've used the term Artifact
to mean something more general than a file.
I agree with @rgopikrishnan91 that we should clarify how hasDataFile
(or hasAsset
/ hasArtifact
) is different from dependsOn
.
I can see that hasDataFile
doesn't imply dependency, while dependsOn
explicitly does.
hasDataFile
also suggests that the to
Element should be a File
(although I don't think it is enforced by the model), while the to
of dependsOn
can be anything.
In this sense, dependsOn
works more at the abstract level. While hasDataFile
details the implementation level.
Yesterday AI team meeting (2024-07-31), we settled with this:
- hasDataFile: The
from
Element treats eachto
Element as a data file. A data file is an artifact that stores data required or optional for thefrom
Element's functionality. A data file can be a database file, an index file, a log file, an AI model file, a calibration data file, a temporary file, a backup file, and more. For AI training dataset, test dataset, test artifact, configuration data, build input data, and build output data, please consider using the more specific relationship types:trainedOn
,testedOn
,hasTest
,configures
,hasInputs
, andhasOutputs
, respectively. This relationship does not imply dependency.
(see #815) @maxhbr do you think this is sufficient? thank you