Add possibility to ignore deleted files in certain locations
Wastus opened this issue ยท 26 comments
I'd love to see a feature where I can specify files / folders (or a regular expression) which should not count to the deleted file count. But are still synced (so I can't use the SnapRAID exclude feature).
I know that this also means, that the script will have to parse the output of the diff instead of just reading the summary.
Why would I need something like this?
I have a rolling backup running onto my SnapRAID, certain parts will get deleted after a day or two and new things will pop up. This is a mess with counting the deleted files, you never know how much it'll be and honestly I don't really care. Of course someone could just delete my whole backup and the sync would just continue when the backup is excluded from counting, but that is a risk I'll take over just setting the delete count to 1000 and miss that I deleted the wrong folder in some after work delirium.
I have thought this in the past, but this script relies on Snapraid data. Altering such data is very complex, since we're dealing with a raw log.
I'm not a real coder, my abilities are very limited.
I have a rolling backup running onto my SnapRAID, certain parts will get deleted after a day or two and new things will pop up.
I feel you. I'm testing a backup solution from my Windows clients to my server called UrBackup, which is very cool, but likes to cycle files around for maintenance, producing lots of deleted and moved files.
Let's workaround this issue:
-
You could entirely disable the deleted file detection, or enable alert but still sync (that's what I do)
-
You could use the Added/Deleted Threshold
ADD_DEL_THRESHOLD feature, by allowing a sync that would otherwise violate the delete threshold, if the ratio of added to deleted files is greater than the value set.
-
I could add an alternative method, inspired by UrBackup itself:
Generate a random file in a folder that the user chooses, store its SHA256 and check the file before every sync. If it's changed or deleted, stop the sync, otherwise proceed.
I've created a quick draft on how this can be implemented, it's not thoroughly tested yet.
And I have so far failed to create a "copy" event in Snapraid diff output, so I can't create the count for that using the new method, so it's still the same as before. Equal files are not listed (luckily) so this stays the same as well.
Side note: I'm using Kopia, so far it's blazingly fast allowing almost real-time backups for my whole system drive containing 440 GB (I excluded Windows and Programs).
Thank you for the PR! Please keep testing it and let me know. I would like to make a release shortly, so this will eventually be in a subsequent release.
Kopia is on my radar but haven't tested it yet. Looks like a good replacement, since it manages the backup in blocks and not individual files.
Are you simply using a SMB share from your client to store backups or have you deployed the Kopia Server? (there's almost no documentation about it).
But what is the actual issue with Kopia and Snapraid? it removes a lot of files when cycling trough backups?
I'm running Kopia server in a docker image, which saves data to the Snapraid pool.
I've found instructions on using Kopia as a server also lacking and only a forum post explained a bit more, but it is still a bit cumbersome compared to the streamlined experience I had with most other docker images. You need to execute some commands directly in the running docker and you need to adjust the start command after the first run.
I'd say currently it is better suited for running as a normal service.
I still have to fine tune the Snapraid config, last night it failed because Kopia was doing some work in cache and log files (didn't expect that while no client is active). But the main issue for the script is, that blocks get deleted on the backup rotation (it keeps the last x snapshots, x daily snapshots, etc.) and this triggers my deleted threshold easily.
That sounds a bit like Borg backup, which I use in conjunction with the add/del threshold to essentially ignore the blocks deleted by Borg since approximately the same number are added back each backup.
@Wastus since you run Kopia Server as container (that's what I'd like to do) you can pause/stop it before running the script, and recover it when done. I do this with other containers and it's a great featur!
Otherwise you should use the custom hook to stop and start the service.
Can you please link the forum post? It would help me to start working on the Server part.
But the main issue for the script is, that blocks get deleted on the backup rotation (it keeps the last x snapshots, x daily snapshots, etc.) and this triggers my deleted threshold easily.
Allright, now I get it. You could still try the ADD/DEL ratio feature.
@tehniemer Borg in my opinion is the best backup service, period. I use it to backup my whole NAS to an offsite NAS.
But it has a major caveat: no Windows client, and Windows File History is garbage
But yes, Kopia uses a similar approach, which in my opinion is really efficient (at least Borg side, I have a limited experience with Kopia)
@auanasgheps
To save you scraping through that post here is my approach: Replace the MyXyz with your values, I'm not sure if everything is necessary, the local path surely isn't but I needed it to backup my phones rsynced files (there is no Kopia for Android yet).
version: '3.7'
services:
kopia:
image: kopia/kopia:latest
hostname: datavault
restart: unless-stopped
ports:
- 51515:51515
environment:
KOPIA_PASSWORD: MyVeryGoodSecret
TZ: Europe/Berlin
volumes:
- /MyPersistentPath/config:/app/config
- /MyPersistentPath/cache:/app/cache
- /MyPersistentPath/logs:/app/logs
- /MyPersistentPath/backup:/app/backup
- /MyOtherPathIWantToLocalBackup:/mnt/MyLocalBackupPath
entrypoint: ["/bin/kopia", "server", "start", "--tls-cert-file","/app/config/my.cert", "--tls-key-file","/app/config/my.key", "--address=0.0.0.0:51515", "--override-username=MyUser@MyServer", "--server-username=MyUser@MyServer", "--server-password=MyServerSecretIsGood"]
For the first run, you need a different entrypoint to generate the certificate (note the "--tls-generate-cert") you may not run it multiple times, as it creates a new certificate every time then:
entrypoint: ["/bin/kopia", "server", "start", "--tls-generate-cert", "--tls-cert-file","/app/config/my.cert", "--tls-key-file","/app/config/my.key", "--address=0.0.0.0:51515", "--override-username=MyUser@MyServer", "--server-username=MyUser@MyServer", "--server-password=MyServerSecretIsGood"]
Have a look at the output of the first start, there should be something along the lines of:
SERVER CERT SHA256: 48537cce585fed39fb26c639eb8ef38143592ba4b4e7677a84a31916398d40f7
Which you need for setting up you backups from remote devices.
For adding users (you see your user in KopiaUI) the normal documentation is quite helpful. I just log into the container using Portainer but that boils down to running a bash with interactive console with docker. And then execute the commands there.
Thank you @Wastus your advice worked! I was able to create a working Kopia server!
I've added the docker containers to the script and that is working also quite well so far, thanks for that feature.
I'll add an optional output of the deleted files, because currently I'm not really able to tell if it's working as intended for all cases.
@Wastus what do you mean by this?
You mean that you're pausing/stopping containers using the built in feature?
What about the optional ouput?
Just wanted to understand the whole situation
Would this solve the following "problem":
I've got an extensive media library with .nfo files that contain the metadata. Those files get updated frequently which I know but still triggers the diff check.
Excluding *.nfo would solve this, wouldnt it?
Yes, but you would exclude those files or locations in your snapraid.conf
file, not this script. See section 7 of the manual
But with that approach they wouldn't get synced at all. I've read a workaround by having two .conf files and renaming them right before a sync / diff.
That's why I thought this script might be a more elegant solution.
Actually the addition I wrote is exactly for those use cases, you have files which you know change a lot, but should be synced. But you still want to stop a sync when only 20 other files changed.
There is currently no selection or differentiation from what the pattern is excluded. So it will apply to deletions and updates alike.
That might not be fine grained control enough for you @Br33ce.
Hmm if I can only include the pattern "*nfo" it's fine grained enough because those nfo files will be updated and deleted (and I don't care ๐ in terms of snapraid). So I will wait for the PR then.
I originally misinterpreted the intent of this discussion and now understand and am very intrigued.
I like the idea to exclude files from the total calculations, but requires some work. We can consider working on the PR in the future.
At the moment I'm focused on wrapping up the next release.
Sounds good! The next release is more important than this because it's only QOL ๐
Implemented in dev branch. Apologies if this took time. If you want you can test and provide some feedback.
Thank you @tehniemer ๐ I can't test it as of now but checked the config file. One question, do I understand the logic correctly:
If I put ".nfo" as a pattern it will ignore all files which contain that exact pattern (essentially all files ending in that extension)?
It should, best bet would be to test a few examples using the link in that section of the config file.