biocompute-objects/galaxy

external_data_endpoints

Opened this issue · 2 comments

Populate external_data_endpoints in execution_domain

'external_data_endpoints': [],

Defined in https://github.com/biocompute-objects/BCO_Specification/blob/1.4.0/docs/execution-domain.md#254-external-data-endpoints-external_data_endpoints

2.5.4 External Data Endpoints "external_data_endpoints"
An optional multi-value field listing the minimal necessary domain specific external data source access in order to successfully run the script to produce BCO. The values under this field present the requirements for network protocol endpoints used by a pipeline’s scripts, or other software.

The key url defines an endpoint to be accessed. If the path of the URL is / then any resource at the given domain may be accessed, while if the path is more specific than only resources which path prefix matches may be accessed.

The key name should describe the service that is accessed.

"external_data_endpoints": [

{"url": "protocol://domain:port/application/path","name": "generic name"},

{"url": "ftp://data.example.com:21/",
"name": "access to ftp server"},

{"url": "http://eutils.ncbi.nlm.nih.gov/entrez/eutils",
"name": "access to e-utils web service"}

]

From: galaxyproject#10361 (comment)

I would suggest adding something like external_service="<service_url>" to tool xml language and then annotate tool parameters that reference an external entity with this.
So for a tool that downloads accessions this could be something like
<param name="accession" value="SRR12345678" external_service_url="https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=">

@mvdbeek is this something that would have to be implemented on every tool or something that could be added globally and then extracted... Could you provide a little more info?