pegasus-isi/pegasus

[PM-1933] support for private-token to curl invocations

Closed this issue · 11 comments

some workflows may need to retrieve data from gitlab instances that uses http private tokens

https://docs.gitlab.com/ee/api/personal_access_tokens.html

for example chess workflows require to download input data from a gitlab repo that can be done via

curl “https://example.com/edd%2fdata.tar/raw?ref=main” --header “Private-Token: $1" -o data.tar

Reporter: @vahi
Resolution: Fixed
Watchers:
@rynge
@mayani
@vahi

Author: @rynge

How do you feel about this being a header implementation? For example, in the credentials.conf file, you could have:

[example.com]
header1 = Private-Token: $1
header2 = SomeKey: SomeValue

which would later become:

curl “https://example.com/edd%2fdata.tar/raw?ref=main” --header “Private-Token: $1" --header "SomeKey: SomeValue" ...

Author: @vahi

yes i think that should be good. i imagine it will allow users to specify other tokens such as oauth bearer tokens

Author: @mayani

Maybe

[example.com]
header. = would be better, so we can add other inputs if needed

cookie. =
query-arg. =
form-arg. =

Author: @vahi

for the http stuff, in the credentials file the section header should be URL prefix and not the hostname

[http://download.pegasus.isi.edu]
headerkey = headervalue
header.key =
cookie. =
query-arg. =
form-arg. =

One thing to be aware of is that . in the ini file gets escaped by another . when reading in via apache library in java
https://commons.apache.org/proper/commons-configuration/userguide/howto_hierarchical.html#Escaping_special_characters

https://issues.apache.org/jira/browse/CONFIGURATION-597

For the planner, it does not need to check the keys so it is fine.
@ryngeyou should check if the same behavior is for the python config parser module.

Author: @vahi

as part of #2048 i have already put in logic for the planner to look at the url prefix and inspect contents of the cred file to see if it gets associated for a http transfer or not.

Author: @rynge

I added header.* support to pegasus-transfer. There are still some question marks around cookie and argument handling - if we need this we should target it for 5.1

Author: @rynge

@vahi I removed the url decoding - can you give it another try?

Author: @vahi

ok. thought i dont have a working token right now. but we can still distinguish between not authorized and 404.

Author: @vahi

i think removing the decode is the right thing

i see now the following error reported by p-analyzer
/usr/bin/wget -nv --no-cookies --no-check-certificate --timeout=300 --tries=1 --header='private-token: xxxxK' -O '/xxx/work/./data.tar' 'https://gitlab01.classe.cornell.edu/xxx/files/edd%2fdata.tar/raw?ref=main'
2024-01-12 13:55:37,218 INFO: Authorization failed.

compared to earlier
/usr/bin/wget -nv --no-cookies --no-check-certificate --timeout=300 --tries=1 --header='private-token: xxxx' -O './run0002/./data.tar' 'https://gitlab01.classe.cornell.edu/xxx/data.tar/raw?ref=main'
2024-01-10 17:51:16,650 INFO: https://gitlab01.classe.cornell.edu/xxx/data.tar/raw?ref=main:
2024-01-10 17:51:16 ERROR 404: Not Found.

Author: @vahi

it works for http.. not sure whether (not having decode in pegasus-transfer) will break for other protocols
i think it should not.

Author: @vahi

verified token support