Back up your LDAP data in a Git repository
On Debian and derivatives, ldap-git-backup is available as package since Debian 8 Jessie.
Call the following command with root permissions to install it from the official APT repositories:
apt-get install ldap-git-backup
wget https://github.com/downloads/elmar/ldap-git-backup/ldap-git-backup.tar.gz
tar xzf ldap-git-backup.tar.gz
cd ldap-git-backup-*
./configure
make
make check
sudo make install
If you are using OpenLDAP the script
ldap-git-backup
will dump your current LDAP database into /var/backups/ldap
and check it into Git. Do this regularly (e.g. via cron) and you will have a versioned history of backups.
How do you back up your LDAP data? Typically, you run a cron job once per day to dump all LDAP entries into a LDIF file then compress this file and keep a few generations. With OpenLDAP the command to use is slapcat
. Since LDAP usually contains a lot of similar entries the compression is quite good and you can keep a few backups with space requirements comparable to the LDAP database itself. When disaster strikes you take the most recent backup that you know not to contain the problem and restore the data from there.
This is all quite resonable, but we can do better. LDAP is used for data that are read more often than written. As a result the differences between two consecutive backups are relatively small. If we store only the deltas we can save quite some space. Git will do all the management of the delta and keep a record of the history. To make this approach practical we split up the LDIF into individual entries and track the entries as separate files.
The script ldap-git-backup
is a convenient way to automate this process. Each time it is run it will call the LDIF command given by the option --ldif-cmd
or /usr/sbin/slapcat
if none is given. The output of the LDIF command is split up into individual files, one per entry. The file names are a combination of the creation time of the entry and a hash of the DN. With this naming the individual LDIF entries will keep their names between backups and still avoid name clashes. The unlikely case of a name clash will be automatically handled by adding a simple count to the file.
The backup location will be /var/backups/ldap
or an alternative directory given by the --backup-dir
option. This directory will also contain the Git repository. The directory and the Git repository will be created if needed when the first backup is made.
The simplest backup strategy would just call ldap-git-backup
once per day via cron. Pick a quiet time for the LDAP directory and add a command like the following to your crontab (e.g., crontab -e
or in /etc/cron.d/ldap-git-backup
):
0 5 * * * /usr/sbin/ldap-git-backup --commit-msg 'backup by daily cron'
Alternatively or in addition you can also trigger a backup whenever some relevant event like adding an LDAP entry or changing a password occurs. The details depend on your setup. Ultimately, you want to call ldap-git-backup
with some helpful --commit-msg
.
Unless you are starting a new LDAP directory chances are you already have some backups. With OpenLDAP you would typically use the above mentioned slapcat
command and compress the result. You can use ldif-git-backup
to add these older backups to the Git repository.
Here is some example code that may serve as a start:
#!/bin/bash
LDIF_DIR=$1
BKUP_DIR=$2
function usage () {
echo "usage: $0 <LDIF_DIR> <BACKUP_DIR>"
exit 1
}
[[ -d $LDIF_DIR ]] || usage
[[ -d $BKUP_DIR ]] || usage
for filepath in `ls -1tr $LDIF_DIR/TODO/*.gz` ; do
filename=`basename $filepath`
echo $filename
ldap-git-backup \
--ldif-cmd "zcat $filepath" \
--backup-dir $BKUP_DIR \
--commit-msg $filename \
--commit-date $filepath
mv $filepath DONE/$filename
done
To use this script you put all compressed old LDIF file in a subdirectory TODO
of some directory where you keep the backups. You must also create another directory DONE
where the processed backups will be moved into. Furthermore, the script assumes that the backup files still have their original modification time of when the backup was performed. This is used to sort the files and to set the commit date.
If your setup is different you have to adjust the script or write your own. Alternatively, you can decide to start with a clean slate and keep the old backups unchanged for as long as your backup policy requires.
If you need to restore some data you will find the individual LDIF files checked out in the working tree of the backup directory (/var/backups/ldap
by default). You can use these files directly if you just need the latest version of one or a few LDAP entries. You can use the standard commands meant to deal with LDIF files. You may also consider writing some Perl/Python/Ruby scripts depending on the restore you want to make.
If you want to restore the entire LDAP database it might by enough to concatenate all LDIF files and add them to an new, empty LDAP directory. The file names contain the create timestamp as their first part. Any entry that depends on another entry needs to have a timestamp that is later than that of the entry it depends on. If the data has accumulated naturally, this is usually the case. However, if an entry now depends on another entry that was only created later you may need to reorder the LDIF files or you add them in several passes and keep a list of entries that didn't add themselves properly in an earlier pass.
If you want to do more complicated restores it is advisable that you clone the Git repository first. You need to clone only the commits that are relevant for the restore. A typical session might look like this (cloning the last 10 backups and checking out 7 commits earlier):
mkdir /tmp/ldap-restore
cd /tmp/ldap-restore
git clone --depth 10 /var/backups/ldap
git checkout -b restore master~7
...
You may be looking for the cause of a problem where you suspect some changes in an LDAP entry to be the reason. You can use the Git repository to see all the changes done to the LDAP directory (in intervals given by your backup policy). You may do something like:
cd /var/backups/ldap
git log --stat
git show <commit-ID>
You may also see the complete diff history with git log -p
. Individual entries retain their name as long as the create timestamp and DN do not change. Therefore, you can investigate the history of an individual entry with git log -p filename
.
With OpenLDAP the command slapcat
is meant to produce a snapshot LDIF of an active OpenLDAP server. Before OpenLDAP 2.4.36 (August 2013) this worked most of the time but occasionally we experienced truncated output without any indication of a problem. slapcat
would exit with a zero exit code and the resulting LDIF dump is valid but too short. This was a bug in OpenLDAP (ITS#7503 and ITS#6365) and has been fixed with the release of OpenLDAP 2.4.36. In Debian and derivatives the Debian bug #673038 has been fixed since the package version 2.4.31-1+nmu1 and Debian 7 Wheezy.
To make the backups a bit more robust we can trade speed with reliability and call the --ldif-cmd
(i.e., slapcat
) multiple times to verify the backup. safe-ldif
does just this and writes the last dump as soon as two consecutive outputs result in the same number of LDIF stanzas. This accommodates LDAP directories where attributes may be updated relatively frequently but the number of entries changes only rarely.
Since reliable backups are important safe-ldif
is the default.
Over time various people contributed with discussions, ideas, code, and bug reports.
- Axel Beckert abe@debian.org (co-maintainer)
- Florian Ernst <florian_ernst@gmx.net>
- Bart Martens bartm@debian.org
- Hans Spaans hans.spaans@nexit.nl
- Matthew Richardson