encodingFormat should be distinct from contentType (e.g. to describe data.csv.gz)
Opened this issue · 3 comments
I am attempting to document aDataset
in which the DataDownload
objects are compressed CSVs, e.g. .csv.gz
objects. What is the correct schema.org annotation in this case?
DataDownload
includes the property encodingFormat
(which is also already an inherited property on CreativeWork, though admittedly DataDownload type allows for multiple formats of the same data).
I believe the typical definition of "encoding" would be the compression algorithm here, e.g. as RFC 2616 defines the http header Content-Encoding. This is at odds with the schema.org definition of encodingFormat
, which seems to refer instead to the content type (i.e. text/csv
in this case), as evidenced by the suggestion to use a MIME media type (which refers to the underlying type, not the compression, as I understand it).
I suppose I could define the schema:encodingFormat as something like application/csv+gzip
in this case, but that would seem to be a non-standard way of representing this information. Thoughts / advice much appreciated.
(Just a note that this is related to schemaorg/schemaorg#1155, in which it appears that fileFormat
was deprecated or collapsed into encodingFormat
. This seems to have led to us losing the ability to distinguish between how content is serialized (csv, tsv, xml, json etc) vs how it is encoded (e.g. compression, as per RFC 2616 section 14.11)
See issue #7 for the context of the move from the main Schema.org issue tracker to this repository.
quick note-- schema:contentType is not an expected property of DataDownload...
see also ESIPFed/science-on-schema.org#131 and ESIPFed/science-on-schema.org#132