Implementing multiple backends by re-using snakemake.remote or pyfilesystem2
Avsecz opened this issue · 3 comments
Would it be possible to wrap the classes implementing snakemake.remote.AbstractRemoteObject
(snakemake.remote, AbstractRemoteObject) into lazydata.remote.RemoteStorage
class?
This would allow to implement the following remote storage providers in one go (https://snakemake.readthedocs.io/en/stable/snakefiles/remote_files.html):
- Amazon Simple Storage Service (AWS S3):
snakemake.remote.S3
- Google Cloud Storage (GS):
snakemake.remote.GS
- File transfer over SSH (SFTP):
snakemake.remote.SFTP
- Read-only web (HTTP[S]):
snakemake.remote.HTTP
- File transfer protocol (FTP):
snakemake.remote.FTP
- Dropbox:
snakemake.remote.dropbox
- XRootD:
snakemake.remote.XRootD
- GenBank / NCBI Entrez:
snakemake.remote.NCBI
- WebDAV:
snakemake.remote.webdav
- GFAL:
snakemake.remote.gfal
- GridFTP:
snakemake.remote.gridftp
- iRODS:
snakemake.remote.iRODS
- EGA:
snakemake.remote.EGA
Pyfilesystem2
Another alternative would be to write a wrapper around pyfilesystem2: https://github.com/PyFilesystem/pyfilesystem2
. It supports the following filesystems: https://www.pyfilesystem.org/page/index-of-filesystems/
Builtin
- FTPFS File Transfer Protocol.
- ...
Official
Filesystems in the PyFilesystem organisation on GitHub.
- S3FS Amazon S3 Filesystem.
- WebDavFS WebDav Filesystem.
Third Party
- fs.archive Enhanced archive filesystems.
- fs.dropboxfs Dropbox Filesystem.
- fs-gcsfs Google Cloud Storage Filesystem.
- fs.googledrivefs Google Drive Filesystem.
- fs.onedrivefs Microsoft OneDrive Filesystem.
- fs.smbfs A filesystem running over the SMB protocol.
- fs.sshfs A filesystem running over the SSH protocol.
- fs.youtube A filesystem for accessing YouTube Videos and Playlists.
- fs.dnla A filesystem for accessing accessing DLNA Servers
Good idea, this would be great!
Can you explain how to implement something like this? E.g. where to put the class, how to name it, which methods to implement and what are the potential caveats?
Some thoughts:
- I would probably keep the interface of the
remote.RemoteStorage
class unchanged. - I would create a new class
remote.SnakeMakeRemoteStorage
that inheritsremote.RemoteStorage
and takes at least two parameters: a snakemake backend name + any other necessary parameters (e.g. the access keys). I probably wouldn't want to reimplement the S3 and other existing backends. - In
remote.RemoteStorage.get_from_url()
andremote.RemoteStorage.get_from_config()
make sure the remote storage backend is correctly parsed from thelazydata.yml
config file and the correct child classSnakeMakeRemoteStorage
instantiated. - In
cli.commands.config
I would allow for configuration of any necessary additional access keys - In
cli.commands.add_remote
I would allow for any of the snakemake backends to be specified.
And that should be it. Some unit tests would be welcome as well :)