Handle dryad URLs
Opened this issue · 5 comments
Currently, because of the way the data URL for dryad is constructed, it doesn't work with our function. check_version
ends up looking for nonsensical results because it keeps chunking the URL and eventually looking for anything that matches 1
. I've changed the breaking point to nchar(pid) > 5 (instead of 0) to account for this to some extent. 4163fb9
Not sure what the logic of dryad URL's is, so more investigation is needed!
download_d1_data("https://datadryad.org/bitstream/handle/10255/dryad.181477/experiement1.txt?sequence=1", ".")
For some related issues on the structure of Dryad identifiers in DataONE, see https://redmine.dataone.org/issues/7896
@gothub sorry for the confusion. The idea is that scientists could also go on each data repository and get the URL from there. The KNB check_version("https://knb.ecoinformatics.org/knb/d1/mn/v2/object/msleckman.40.1")
seems to conform to what we discussed; but we should also handle PASTA check_version("https://pasta.lternet.edu/package/data/eml/edi/195/2/51abf1c7a36a33a2a8bb05ccbf8c81c6")
.
The DRYAD URL comes from this package https://datadryad.org/resource/doi:10.5061/dryad.7ns4pk2 for the dataset experiment_1.txt
. It seems that https://datadryad.org/bitstream/handle/10255/dryad.181477/experiement1.txt will also resolve and if I search for dryad.181477
on their repo I find the corresponding data package; so more likely their internal identifier?
Side note: when I search on dataONE for this DOI (10.5061/dryad.7ns4pk2) I get 5 hits...more likely related to the problem Matt mentioned, but if I search for the "DRYAD" dataset identifier (dryad.181477) I get 0 hit.
So we might have to understand the URL logic behind DRYAD if we want to support it.
Here is the corresponding DataONE URL for the above Dryad id: https://cn.dataone.org/cn/v2/resolve/https://doi.org/10.5061/dryad.7ns4pk2/1/bitstream
@gothub following our discussion I think it would make sense to add a rule to prioritize the DataONE URLs and then default to the current system if it fails to make the fct more efficient.
This being said that does not solve the mapping problem between DRAYD URLs and corresponding DataONE ones.