Detect and handle stale locks

Question

Detect and handle stale locks

Opened this issue 8 years ago · 6 comments

As a system administrator I want to be able to trust my backup mechanism to run even if it failed once without having to manually check it every time

Observed behavior

Backup script runs
Script gets killed from whatever reason, is unable to remove its lock
Consecutive backups fails with lock held by XXXX
No notification is sent using slack channel though it's configured

Expected behavior

The stale lock is detected because of the registered PID is no longer running
Stale lock is removed, backup process is continued

OR

Lock is detected
Deadlock (PID no longer running) is detected
Slack channel gets notification about deadlock

Logs

# cat /var/log/duplicity/duplicity-2016-05-15_01-12.txt
--------    START DUPLICITY-BACKUP SCRIPT for docker01   --------

Attempting to acquire lock /var/log/duplicity/backup.lock
lock failed, could not acquire /var/log/duplicity/backup.lock
lock held by 3124
# ps aux | grep 3124
root      7661  0.0  0.0  11712   668 pts/3    S+   02:10   0:00 grep --color=auto 3124
#

Answer 1 · 2016-05-15T10:59:21.000Z

You're right, this would be a very useful enhancement. Not sure I will be able to look into implementing it soon. Anyone feel free to propose a pull request before I do 😉

Answer 2 · 2016-05-17T08:50:49.000Z

Regarding the second expected behavior, the script does send an email but not the other notification methods (e.g. slack). @zertrin, maybe we should simply add send_notification next to email_logfile at https://github.com/zertrin/duplicity-backup/blob/b92d60f028dffb94dc3aff2cd674dce4d5a9f48c/duplicity-backup.sh#L436?
Actually there 10 appearances of exit in the script, maybe they should be replaced by some notificiation-sending function? (at least if the configuration was correct enough to set it up).

Answer 3 · 2016-05-17T09:43:24.000Z

I fully agree. I'll look into this ~~soon~~ sometime since that's easier.

Answer 4 · 2016-08-31T20:00:27.000Z

@zertrin

I did just what @jarondl suggested above and nothing more. I have two enhancements in mind:

Figure out way to notification carrier a message that identifies the error (in this case, the stale lock)
Handle the stale lock would be nice as suggested by @Luzifer

I let those two for later. However, regarding item 1 I don´t figured out the best way to do this, I think it may require a refactoring of send_notification in order to accept some optional parameter. Any thoughts?

Answer 5 · 2016-09-23T07:58:49.000Z

How do you deal with rebooting the server you're backing up? Each time that I do, it's halfway through the last backup causing it to never start back up since the lockfile still exists.

Answer 6 · 2016-09-23T09:58:33.000Z

It doesn't happen to me since my backup doesn't last that long and I'm never rebooting around the time where my backup is running.

Locking mechanisms are hard to get right and can be annoying. Still didn't found the time to implement a solution, but I welcome contributions that aim at doing locking "the right way" (probably with a PID check somewhere).