pangeo-forge/pangeo-forge-recipes

Could/Should our caching step support scp?

jbusecke opened this issue · 4 comments

I am currently facing a lot of request to ingest data from various servers/HPC centers. For example in leap-stc/data-management#126 (comment) http, ftp, or globus are not accessible.

I think this is pushing the concept of pgf a bit, but enabling this would unlock a LOT of interesting datasets for ingestion.

I guess my naive way to think about this is we could store ssh login creds in the feedstock secrets, then transfer the data to the local worker storage, and move them to a cloud bucket?

Any thoughts concerns @rabernat @moradology @ranchodeluxe @cisaacstern?

fsspec has an SFTP implementation: https://github.com/fsspec/sshfs, and I'm a big fan of asyncssh (what backs this)

That is super useful! Thank you @yuvipanda.
So the situation above is actually more complex, since that server is only available from within the columbia network or via VPN. Wondering if there is an elegant way to navigate this?

What would the direction of data flow be? Something 'pushing' into 'pangeo forge' or pangeo forge 'pulling'? Right now everything is really a pull.

I would like to keep it as a pull if possible.