Schema.org profile for specifying software input/output data possibilities, used for software metadata descriptions.
Authors: Maarten van Gompel and Daniel Garijo
Profile available at: https://w3id.org/software-iodata
Supported serializations: JSON-LD (application/ld+json
), Turtle (text/turtle
) and HTML. See the code snippet below for an example on how to retrieve the profile in Turtle with a curl
command:
curl -sH "accept:text/turtle" -L https://w3id.org/software-iodata
This profile describes vocabulary terms needed to describe, as part the metadata of software applications, what data is consumed (input) or produced (output) by an application. The profile is meant to be used with schema.org, codemeta and the software type profile.
Our goal is to introduce as little additional vocabulary as possible and only extend schema.org and codemeta where representation gaps are present.
The current schema.org and codemeta vocabulary do not provide a clear way to
describe, for a schema:SoftwareApplication
, the type of data that this
software may consume or produce.
We introduce two simple and limited properties that allow this.
This availability of specifying data input/output possibilities allows for more accurate software metadata descriptions.
Disclaimer: this work aims to create a profile that may be incorporated into codemeta or schema.org. The profile has persistent identifiers, but, if standardized, the classes and properties defined here may be absorbed into the above vocabularies.
We introduce the following properties:
consumesData
- The software takes the specified data as one of its inputs. The target object is typically interpreted as a template describing a class of data. For example, For example, a Dataset in NetCDF format containing precipitation and time variables (in their respective units)producesData
- The software produces the specified data as one of its output. The target object is typically interpreted as a template describing a class of data. For example, a text document in CSV format.
The domain of both these properties is schema:SoftwareApplication
(or any of
its subclasses) or schema:SoftwareSourceCode
. The former is preferred as it
is more specific and combines nicely with the usage of schema:targetProduct
as
proposed in https://w3id.org/software-types. If the domain is
SoftwareSourceCode
, then these properties are interpreted as less specific (read
as: some undefined build target of the source code consumes/produces certain type of data).
The range of the properties is a schema:CreativeWork
(or any of its subclasses), this
offers a high degree of flexibility and reuses as much as possible of the
existing schema.org properties for e.g. expressing content/encoding types,
natural languages etc. The object is to be typically interpreted as a
template data type, describing a class of data that meets the constraints as laid out by
further properties on that data. However, if enough constraining properties are
specified, it can also be used to specifically refer to a single particular dataset.
We recommend use of the following schema.org subclasses of CreativeWork
as object:
The example below describes software metadata for a fictitious speech recognition tool that takes an audio file and produces a textual transcription:
{
"@context": [
"https://raw.githubusercontent.com/codemeta/codemeta/2.0/codemeta.jsonld",
"https://raw.githubusercontent.com/schemaorg/schemaorg/main/data/releases/13.0/schemaorgcontext.jsonld",
"https://w3id.org/software-types",
"https://w3id.org/software-iodata"
],
"@type": "SoftwareSourceCode",
"name": "MySpeechRecognizer",
"codeRepository": "https://github.com/someuser/MySpeechRecognizer",
...,
"targetProduct": [
{
"type": "CommandLineApplication",
"executableName": "transcribe",
"name": "My Speech Recognition Tool",
"runtimePlatform": "Linux"
"consumesData": {
"@type": "AudioObject",
"encodingFormat": "audio/mp3",
"inLanguage": {
"@id": "https://iso639-3.sil.org/code/eng",
"@type": "Language",
"name": "English"
"identifier": "eng"
}
},
"producesData": {
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": {
"@id": "https://iso639-3.sil.org/code/eng",
"@type": "Language",
"name": "English"
"identifier": "eng"
}
}
},
]
}
The consumesData
and producesData
properties are deliberately limited and do
not provide a full specification of the input/output of software. Both
properties merely describe what kind data is consumed or produced as one of the inputs or outputs of a software tool, with no guarantees
and no explicit relations between any of them. If multiple objects are specified,
it's not defined whether this is a union or intersection. The constraints on the
data classes themselves (in the above example: encodingFormat
,inLanguage
), however,
should be interpreted as an intersection.
Our aim here is just to give software metadata the ability to express some information on what input and output a particular piece of software can accept or produce, so that users reading the metadata can for example make an informed decision whether certain software is useful for their purposes given the data they have or expect to produce.
We do not express how this data is passed to or obtained from the software, nor what data should or should not be passed/obtained together, nor what the exact relation between the input and output data is. All that would be up to other more complex initiatives like OpenAPI or HydraCG. We merely enrich the metadata description so that data expectations can be communicated to the end-user.
Support for our software_iodata
extension to codemeta/schema.org is
being implemented in the following software:
- codemetapy - Python library and command-line tool for converting to codemeta and creating/manipulating existing codemeta descriptions.
- codemeta-harvester - Automatically harvests and converts software metadata to codemeta
Many vocabularies exist to describe software or its constituent parts, e.g., the software description ontology, description of a project vocabulary, hydra (for API description), the common workflow language (description of inputs and outputs of software components, etc.), etc. Our proposed profile does not aim to redefine any new term related to software, but propose a lightweight profile that can be easily incorporated into schema.org or codemeta.
You can consult the following projects as examples that make use of this profile:
Furthermore, the CLARIAH Tools Portal is build on the aforementioned implementations and may offer further examples of codemeta that also incorporates this software application type profile.
This work was indirectly and partially funded through the CLARIAH-PLUS project.
This work has been supported by the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with Universidad Politécnica de Madrid in the line Support for R&D projects for Beatriz Galindo researchers, in the context of the V PRICIT (Regional Programme of Research and Technological Innovation)