Add metadata related to fetched URIs
jmfernandez opened this issue · 1 comments
Right now WfExS does not keep a correspondence between URLs and downloaded files, as the filenames are hashes generated from the URL. But there are several scenarios where additional upstream metadata is available, and future cases where a single URL corresponds to a collection of files. An example of this last one, an ENCODE Experiment id or EGA dataset id correspond to more than one file, maybe with their independent download URL.
So, there should be an intermediate metadata layer, where these correspondences and upstream metadata are kept. After this change, name of cached files should be the sha256 of their content, and URIs should translate to JSON files named as the hash of the URI, containing the correspondences to cached files, and their origins.
Last, but not the least important, upstream metadata should be gathered and preserved in the execution provenance