API research for API integration in Data Retriever (GSoC '21)
Aakash3101 opened this issue · 0 comments
APIs researched for Data Retriever API integration
1. Stash API
Dryad Homepage: https://datadryad.org/stash
Stash API: https://datadryad.org/api/v2/docs/#/
Datasets are mostly related to biological sciences. Here is a screenshot of the search page for datasets: https://datadryad.org/search
All datasets are provided with a DOI string to uniquely identify them. These doi strings are present in the homepage url of the dataset. For example: https://datadryad.org/stash/dataset/doi:10.5061/dryad.f4qrfj6t3
Here the doi string is: doi:10.5061/dryad.f4qrfj6t3
The Stash API is used to get information related to the dataset using the doi string. It also has the download dataset api with it.
GET /datasets/{doi}/download is used to download the dataset. The doi string should be formatted to process the request.
For example: DOI like doi:10.1000/18238577 that should be escaped (example: doi%3A10.1000%2F18238577 )
All the datasets are downloaded as zip files. Some datasets do have .txt files which describe the files in the dataset.
2. Mendeley Data API
Homepage: https://data.mendeley.com/
Mendeley API: https://data.mendeley.com/api/docs/
This API also identifies the datasets by their doi string. But it would be unnecessary to use this because the files of the dataset can be downloaded using the download_file()
function in Retriever.
3. CDS API
Homepage: https://cds.climate.copernicus.eu/cdsapp#!/home
CDS API: https://cds.climate.copernicus.eu/api-how-to
CDS is Climate Data Store.
This platform hosts spatial data only. But the files have the formats GRIB, GRIB2, NetCDF-4, which are currently not supported by Retriever.