datasets
Datasets is a JSON-LD @context
that you can use to publish a visual dataset of images to be used as input to computer vision classification and localization training pipelines.
Here is a concrete example:
{
"@context": "https://code.sgo.to/datasets",
"@type": "Dataset",
"name": "Dog breeds",
"description": "Different breeds of dogs",
"classes": [{
"@type": "Class",
"name": "Chihuahua",
"images": [
"http://code.sgo.to/dogs/images/n02085620-Chihuahua/n02085620_10074.jpg",
"http://code.sgo.to/dogs/images/n02085620-Chihuahua/n02085620_10621.jpg"
]
}, {
"@type": "Class",
"name": "Maltese",
"images": [
"http://code.sgo.to/dogs/images/n02085936-Maltese_dog/n02085936_10073.jpg",
"http://code.sgo.to/dogs/images/n02085936-Maltese_dog/n02085936_10148.jpg"
]
}]
}
Here is the schema:
Dataset
Property | Type | Description |
---|---|---|
name | String | The name of the dataset |
description | String | A short description about the dataset |
url | URL | Where this dataset is to be found |
download | URL | A link to an archive version of this dataset. |
citation | Bib[] | The citation requirements while using this dataset |
release | String | The release number |
createdDate | Date | The date the dataset was created |
publishedDate | Date | The date the dataset was published |
modifiedDate | Date | The date the dataset was last modified |
classes | Class[] or URL[] | An array of classes in this dataset |
Class
Property | Type | Description |
---|---|---|
name | String | The name of the class |
description | String | A short description about the class |
images | Image[] | An array of images in this class |
Image
Property | Type | Description |
---|---|---|
name | String | The name of the class |
url | URL | The url with the bits of the image |
size | Size | The size of the image |
boxes | Box[] | An array of bounding boxes where the class appears in the image |
Size
Property | Type | Description |
---|---|---|
width | Number | The width (in pixels) of the image |
height | Number | The height (in pixels) of the image |
Box
Property | Type | Description |
---|---|---|
left | Number | The leftmost limit of the bounding box (in pixels) |
right | Number | The rightmost limit of the bounding box (in pixels) |
top | Number | The top limit of the bounding box (in pixels) |
bottom | Number | The bottom limit of the bounding box (in pixels) |
Bib
A bib description.
String
A text string.
Date
A date value in ISO 8601 date format.
Number
A numeric value.
URL
A URL.
Archive
To ease the distribuition of image binaries (which tend to be large, say (1GB)), Datasets can be published inside archiveable formats (.tar.gz
, .zip
, etc). Datasets have a download property which points to a URL that you can programatically use to download everything.
By convention, an archiveable Dataset contains an index.jsonld
file inside the root directory of the archive, which serves as an entrypoint to any other linked data inside the archive (which can be referenced through relative links).