weecology/retriever

API research for API integration in Data Retriever (GSoC '21)

Aakash3101 opened this issue · 0 comments

APIs researched for Data Retriever API integration

1. Stash API

Dryad Homepage: https://datadryad.org/stash
Stash API: https://datadryad.org/api/v2/docs/#/

Datasets are mostly related to biological sciences. Here is a screenshot of the search page for datasets: https://datadryad.org/search

All datasets are provided with a DOI string to uniquely identify them. These doi strings are present in the homepage url of the dataset. For example: https://datadryad.org/stash/dataset/doi:10.5061/dryad.f4qrfj6t3
Here the doi string is: doi:10.5061/dryad.f4qrfj6t3

The Stash API is used to get information related to the dataset using the doi string. It also has the download dataset api with it.

GET /datasets/{doi}/download is used to download the dataset. The doi string should be formatted to process the request.
For example: DOI like doi:10.1000/18238577 that should be escaped (example: doi%3A10.1000%2F18238577 )

All the datasets are downloaded as zip files. Some datasets do have .txt files which describe the files in the dataset.

2. Mendeley Data API

Homepage: https://data.mendeley.com/
Mendeley API: https://data.mendeley.com/api/docs/

This API also identifies the datasets by their doi string. But it would be unnecessary to use this because the files of the dataset can be downloaded using the download_file() function in Retriever.

3. CDS API

Homepage: https://cds.climate.copernicus.eu/cdsapp#!/home
CDS API: https://cds.climate.copernicus.eu/api-how-to

CDS is Climate Data Store.

This platform hosts spatial data only. But the files have the formats GRIB, GRIB2, NetCDF-4, which are currently not supported by Retriever.