helmuthb/rsnapshot-docker

Does not backup some important folders (e.g. /home and /var)

kevindd992002 opened this issue · 31 comments

So I'm trying to setup and use your container and here's how I have it setup in my docker-compose.yml file:

rsnapshot:
    image: helmuthb/rsnapshot
    container_name: rsnapshot
    volumes:
      - /mnt/storage/backup:/backup
      - /:/data
      - /home/kevin/appdata/rsnapshot/backup.cfg:/backup.cfg
    restart: unless-stopped
    environment:
      - PUID=0
      - PGID=${PGID}
      - TZ=Asia/Manila

A couple of questions/concerns:

  1. Is the PUID environment variable necessary to be set to 0 (root) so that it will have all permissions to the whole local filesystem?
  2. I tried running "docker exec -it rsnapshot rsnapshot daily" and it did backup most of the folders in my local root filesystem but the /home and /var folder do not have any contents at all. Why is this? I checked the rsnapshot.conf file insider the container and it had these:

config_version  1.2
snapshot_root   /backup/
no_create_root  1
cmd_cp          /bin/cp
cmd_rm          /bin/rm
cmd_rsync       /usr/bin/rsync
cmd_ssh         /usr/bin/ssh
ssh_args        -i /ssh-id -o StrictHostKeychecking=no
verbose         1
lockfile        /var/run/rsnapshot.pid
backup          /data   localhost/      one_fs=1
retain  daily   3
retain  weekly  3
retain  monthly 3
retain  yearly  3

So it looks like everything is setup correctly. I even checked the contents of /data from inside the container and everything was there.
`

Nevermind, I got it solved by specifiying one_fs=0 and excluding /mnt altogether. Now my rsnapshot.conf looks like this:

config_version  1.2
snapshot_root   /backup/
no_create_root  1
cmd_cp          /bin/cp
cmd_rm          /bin/rm
cmd_rsync       /usr/bin/rsync
cmd_ssh         /usr/bin/ssh
ssh_args        -i /ssh-id -o StrictHostKeychecking=no
verbose         1
lockfile        /var/run/rsnapshot.pid
backup          /rootfs epsilon/        exclude=/rootfs/mnt
retain  daily   3
retain  weekly  3
retain  monthly 3
retain  yearly  3

And my yml file like this:

rsnapshot:
    image: helmuthb/rsnapshot
    container_name: rsnapshot
    volumes:
      - /mnt/storage/backup:/backup
      - /:/rootfs
      - /home/kevin/appdata/rsnapshot/backup.cfg:/backup.cfg
    restart: unless-stopped
    environment:
      - PUID=${PUID}
      - PGID=${PGID}
      - TZ=Asia/Manila
      - BACKUP_NAME=epsilon
      - BACKUP_SOURCE=/rootfs
      - BACKUP_OPTS=one_fs=0
      - BACKUP_OPTS=exclude=/rootfs/mnt

My problem now though is that when I run docker exec -it rsnapshot rsnapshot daily from the host, it never finishes but when I check the target folder for the backup everyting seems to be backed up properly. Your container also isn't set to do logging so I don't know what to check to troubleshoot the issue. Can you please help?

@helmuthb, do you offer support for your containers? Do you monitor this github?

Hi @kevindd992002 thanks for your report. To consider one_fs is an important stuff and should probably be mentioned in the README.

This Docker image is for a "continuous" container. It does not stop and will do backups daily / weekly / monthly as you specify. Therefore you won't see the container finish ever.

When you say "it does not stop", does that mean after it backups all data it simply waits for the next schedule based on what I specify? When I initially tested this (when I didn't set one_fs=0 yet), running docker exec -it rsnapshot rsnapshot daily finished as expected (went back to the prompt). I still believe that this should finish if there are no problems in the process, don't you think?

It's just that when I checked the backed up files, specifically the files inside the Docker folder, they seems to start nesting and nesting infinitely and it bugs down the system. Why is this happening? If it matter, my destination fs is a mergerFS filesystem and that fs is under SnapRAID.

Yes, the expected behavior is that the container does not stop. This is when you do not provide a command to run in the container - then it will run the script entry.sh, which prepares the config file and the crontab and then starts the cron daemon:

...
# Dynamic parts - depending on the retain settings
# This will also create the crontab
...
# start cron - we should be done!
/usr/sbin/crond -f

If you start the image with providing a command to run - as you did in your sample - it will not run the entry.sh or the cron daemon but rather run your command.

Now about the nesting. Maybe you are also backing up the running containers, including the rsnapshot container, which has everything mounted - closing the recursion loop. I would therefore try this:

BACKUP_OPTS=one_fs=0 exclude=/rootfs/mnt exclude=/rootfs/var/lib/docker

NB: I think You will have to specify one BACKUP_OPTS line with all options desired

Right, I understand that but what I'm saying is that when I run a command inside the container as in invoking docker exec -it rsnapshot rsnapshot daily from the host, it should stop. It's probably not stopping because of the looping thing and yes it's a good idea to exclude the docker folder. I'll try that.

Now, how do I set a schedule for this? Which crontab should I be editing?

OK, thanks Kevin, this was my misunderstanding then. Yes, then it was not stopping because it was most probably recursively backing up the backup (indirectly - as mounted into the container). There are other folders you might want to exclude from backup, e.g. /var/run and /var/tmp and /tmp, and also cache files usually in /var/cache - always as /rootfs/... in your case.

Regarding crontab: the entry.sh file will create a crontab automatically, based on the backups you selected to keep (default: daily, weekly, monthly & yearly). This is assuming you don't care too much about the exact timings when backups are done. If you want to modify this, the easiest approach would be creating a new docker image overriding the entry.sh file I assume,

No worries. Thanks for those additional exclusions. I tried to specify all options of BACKUP_OPTS in one line and it was not working. It errors out with something about space being used instead of tabs. Any ideas why?

Ohh I see what you mean. So as soon as I run the container, everything is already running automatically, correct?

Sorry for these mistakes. I checked - according to rsnapshot docu multiple options should be separated by , not blank. My bad... Can you try this please:

BACKUP_OPTS=one_fs=0,exclude=/rootfs/mnt,exclude=/rootfs/var/lib/docker

Edit: Yes, once you run the container all should be running automatically.

Ok, that worked! I'm running a daily snapshot now and I hope it finishes as expected. Are the exceptions you gave not important to backup at all? Any other folders that you think aren't important?

It worked and finished beautifully, thanks :) Now for the schedule, is the timing always "random"?

Hi, great to hear this!
Regarding the schedule, I meant I chose a schedule more or less randomly 🙂
See entry.sh - this creates the schedule:

  • XX:00 for the hourly backup
  • 00:50 for the daily backup
  • Sunday, 11:50 for the weekly backup
  • 1st of month, 12:50 for the monthly backup
  • 1st January, 13:50 for the yearly backup

The hourly backup (or daily if this is the most often one) should take some time, but the others usually only shift backups around and should not take too long (less than one hour) and then all backups will be done in a way without causing locks.

Ok, I just checked the entry.sh and saw those. Would it be a good idea to maybe include environment variables in the container so that we can change this schedule as desired?

I don't do hourly backups, so only daily, monthly, and yearly backups like what's set as default. But I want the backup time to be always early in the morning when everyone is sleeping and not using my NAS. Or does the time really matter? If anyone is using the NAS and the rsnapshot backup kicks-in, say at 00:50 for the daily backup, will it lock anything?

I thought you just need one initial backup and the succeeding ones are all differentials?

Any further help here?

Help please?

Would it be a good idea to maybe include environment variables in the container so that we can change this schedule as desired?

That's a great idea. I made this change in a branch and after a successful test I'll merge it to master.

I don't do hourly backups, so only daily, monthly, and yearly backups like what's set as default. But I want the backup time to be always early in the morning when everyone is sleeping and not using my NAS. Or does the time really matter? If anyone is using the NAS and the rsnapshot backup kicks-in, say at 00:50 for the daily backup, will it lock anything?

Usually under Linux you should not experience any locking from the backup.
So you would expect that users won't be affected.
However, if a file changes while being backed-up, the backup would not be consistent. Doing the backups when everyone is sleeping is a good idea therefore.

I thought you just need one initial backup and the succeeding ones are all differentials?

It's exactly like that. Of course it depends on the use case: if you would backup a busy server then the differential would still take substantial time; if you backup your home folder the differential would most probably be minimal and rather quick.

This is great! Thank you very much for the support.

Hello again. Can you confirm if this works for BACKUP_OPTS?

BACKUP_OPTS=exclude1,
exclude2,
etc.

It's very hard to read everything if it's just on one line.

Yes, this makes it quite difficult to read. However I think that's the way rsnapshot behaves. I'll double check.

Right, I was just thinking if there's a way to make environment variables more readable in the docker-compose.yml file while still passing it as a single string to the container itself.

Also, instead of the default /backup for the snapshot_root, is there a way for me to use a remote rsync destination server? I have a remote Synology NAS that I can use and right now I mount the Synology shared folder through NFS so that I can map that nfs mount to a volume inside the docker but this is not a good solution.

You already have the BACKUP_SOURCE env variable that supports remote source file systems so I don't think it will hurt if you add a BACKUP_DESTINATION variable for remote destination file systems, right?

My assumption was that the user will have to map a folder into the container anyway, as the container cannot access host file system without allowance.
Or do you mean to specify a NFS location, and let the docker container mount it itself? This might add some complexity (e.g. nfs-common package) to the container, but of course this would make it easier for this use case. Then again, I'm not sure whether using NFS together with rsync is the optimal thing - see e.g. https://unix.stackexchange.com/a/450666

I see. What then would be the best way to backup a remote filesystem? I thought rsnapshot is designed for this? My use case is that I have two sites connected via VPN and two NAS boxes (1 located in each site) that I want to backup with this container. Let's call them NAS1 and NAS2. Both of them use your container to backup their local filesystems to their own local drives. However, I want NAS1 to be backed up to a remote storage on the remote site too.

Or would using Samba be better and just mount that in the host and map the mount as a folder into the container?

@helmuthb do you still have any ideas?

Nevermind. I used your rsnapshot docker container ont he destination NAS and made the remote system as the source by letting your container connect through ssh.

Is your container configured to retain hardlinks for every rotation of rsnapshot?

Hi!

Using SSH is definitely a better approach than samba, rsync (and therefore rsnapshot) is working well with it.

Hardlinks: the defaults from rsnapshot are to use hardlinks for the rotations - so each file which has not been changed will not take up more memory.
However, by default rsnapshot will not recreate hardlinks as they exist on the source. You could add this to the configuration; please note however that this can significantly slow down performing backups.

From the rsnapshot man page:

rsync_short_args -a
List of short arguments to pass to rsync. If not specified, "-a" is the default. Please note that these must be all next to each other. For example, "-az" is valid, while "-a -z" is not.
"-a" is rsync's "archive mode" which tells it to copy as much of the filesystem metadata as it can for each file. This specifically does not include information about hard links, as that would greatly increase rsync's memory usage and slow it down. If you need to preserve hard links in your backups, then add "H" to this.

Hi!

Using SSH is definitely a better approach than samba, rsync (and therefore rsnapshot) is working well with it.

Hardlinks: the defaults from rsnapshot are to use hardlinks for the rotations - so each file which has not been changed will not take up more memory.
However, by default rsnapshot will not recreate hardlinks as they exist on the source. You could add this to the configuration; please note however that this can significantly slow down performing backups.

From the rsnapshot man page:

rsync_short_args -a
List of short arguments to pass to rsync. If not specified, "-a" is the default. Please note that these must be all next to each other. For example, "-az" is valid, while "-a -z" is not.
"-a" is rsync's "archive mode" which tells it to copy as much of the filesystem metadata as it can for each file. This specifically does not include information about hard links, as that would greatly increase rsync's memory usage and slow it down. If you need to preserve hard links in your backups, then add "H" to this.

I see. I was more concerned of the saving of storage space for unmodified files and yes I did confirm that it is doing that for the different snapshot versions.

As for the hardlinks that exist on the source, if it doesn't preserve them does that mean it copies them instead?