rclone traverses into .snapshot folders
Closed this issue · 6 comments
noticed that Motuz is not having rclone exclude .snapshot folders from copying rclone should always be launched using
rclone --exclude ".snapshot"
https://github.com/FredHutch/motuz/blob/master/src/backend/api/utils/rclone_connection.py
So first I found a problem with my commit, I had this as one of the elements of the command
list in the copy
method of src/backend/api/utils/rclone_connection.py
:
'--exclude "\.snapshot/"',
It needed to be:
'--exclude', '\.snapshot/',
I fixed that but there is still a problem and you can reproduce it just with rclone:
$ rclone --exclude \.snapshot copyto src/foo dst/foo
2020/09/18 16:07:37 Can't limit to single files when using filters: src/foo
This is very annoying. It means anytime the source of a copy is a single file, you can't use the --exclude
filter.
So we have to have some intelligence in motuz that figures out if src points to a single file or not. This is trivial in the case of posix file systems, but may not be trivial with various types of object storage. It means we have to run an additional operation before the copy. Basically we need the equivalent of an abstract superclass IsSingleFile, with concrete implementations for each cloud connection type (plus local file system), which takes the name of an item in a file system or object storage system and returns True or False depending on whether it is a single file or a directory (or if it is a symlink, does it link to a single file or a directory?). Do we run this function in the web app container or in celery?
At least I think this is what we have to do. @aicioara , or @dirkpetersen you may have an easier solution.
I think I will assign this to @aicioara now as it is no longer as simple as I thought it was.
My "simple fix" has been reverted in the master and prod branches.
Instead of adding intelligence to Motuz, I think the best solution would be to add intelligence to rclone itself. Investigating...
However, this looks like an existing open issue on rclone rclone/rclone#2425
OK, I think we need to g back to the previously proposed plan. The snapshot exclusion should only happen on posix and we ignore it on objectstore. This allows us to put in custom per file exclusion on posix
@dirkpetersen I don't quite understand what you are proposing, can you explain in a little more detail? How do you put in a custom per file exclusion on posix?