FredHutch/motuz

rclone traverses into .snapshot folders

Closed this issue · 6 comments

noticed that Motuz is not having rclone exclude .snapshot folders from copying rclone should always be launched using

rclone --exclude ".snapshot" 

https://github.com/FredHutch/motuz/blob/master/src/backend/api/utils/rclone_connection.py

this is related to #330

So first I found a problem with my commit, I had this as one of the elements of the command list in the copy method of src/backend/api/utils/rclone_connection.py:

 '--exclude "\.snapshot/"',

It needed to be:

 '--exclude',  '\.snapshot/',

I fixed that but there is still a problem and you can reproduce it just with rclone:

$  rclone --exclude \.snapshot copyto src/foo dst/foo
2020/09/18 16:07:37 Can't limit to single files when using filters: src/foo

This is very annoying. It means anytime the source of a copy is a single file, you can't use the --exclude filter.
So we have to have some intelligence in motuz that figures out if src points to a single file or not. This is trivial in the case of posix file systems, but may not be trivial with various types of object storage. It means we have to run an additional operation before the copy. Basically we need the equivalent of an abstract superclass IsSingleFile, with concrete implementations for each cloud connection type (plus local file system), which takes the name of an item in a file system or object storage system and returns True or False depending on whether it is a single file or a directory (or if it is a symlink, does it link to a single file or a directory?). Do we run this function in the web app container or in celery?

At least I think this is what we have to do. @aicioara , or @dirkpetersen you may have an easier solution.
I think I will assign this to @aicioara now as it is no longer as simple as I thought it was.

My "simple fix" has been reverted in the master and prod branches.

Instead of adding intelligence to Motuz, I think the best solution would be to add intelligence to rclone itself. Investigating...

However, this looks like an existing open issue on rclone rclone/rclone#2425

OK, I think we need to g back to the previously proposed plan. The snapshot exclusion should only happen on posix and we ignore it on objectstore. This allows us to put in custom per file exclusion on posix

@dirkpetersen I don't quite understand what you are proposing, can you explain in a little more detail? How do you put in a custom per file exclusion on posix?