/backurne

Backup Ceph's RBD on Ceph, with Proxmox integration

Primary LanguagePythonGNU General Public License v2.0GPL-2.0

backurne

backurne is a handy tool for backuping RBD's image on RBD.
Yep ! What is better, for backuping a Ceph cluster, than another Ceph cluster ?

It does not do much by itself, though, but orchestrate and relies heavily on other tools.
It has a fine integration with Proxmox, but is able to backup "plain" (or "raw RBD") cluster as well.

Supported features

  • Snapshot-based backup, with no agent strictly required on the VM.
  • Backup inspection and restoration via command line interface as well as via REST API.
  • Support multiple retention policy efficiently (both in term of storage and network bandwidth), dynamically configurable per host (proxmox-only) via REST API.
  • Auto cleanup : deletion is never generated by a human, thus no human mistakes.
  • Compression and encryption "on the wire" for enhanced efficiency and security.
  • Peaceful integration with other snapshots (via Proxmox web interface or whatever).
  • Multiple cluster support, with mixed type ("proxmox" and "plain").
  • A couple of backups can be stored on the live clusters, for faster recovery.
  • Optional fsfreeze support (proxmox-only) via Qemu-quest-agent.
  • Backup deactivation via Proxmox's web interface.
  • External custom processing via hooks.
  • LVM support: backup's lvs are detected and mapped (if possible) for further exploration. See below.
  • vmware support: vmfs are detected and supported. Each vmdk are also mapped and mounted. See below.
  • Microsoft dynamic disks support: each logical disk will be mapped and mounted. See below.
  • VM tracking, for those who uses a single Proxmox cluster with multiple Ceph backend.

Encryption and compression at rest are also seamlessly supported via Bluestore OSDs (see https://ceph.com/community/new-luminous-bluestore/)

Required packages

Core: python (>=3.7), python3-dateutil, python3-termcolor, python3-prettytable, python3-requests, python3-proxmoxer, python3-psutil, python3-anytree (from https://github.com/c0fec0de/anytree, .deb for buster attached for convenience), zstd for compression
For mapping (optional): kpartx, rbd-nbd (Mimic or later), lvm2, vmfs-tools, vmfs6-tools, ldmtool
For the REST API: python3-flask, python3-flask-autoindex
For bash autocompletion: jq

Installation

  • Check out the Authentification parts.
  • Clone the source, edit the configuration
  • Setup a Ceph cluster, used to store the backups
  • Profit ?

Configuration

See custom.conf.sample

Authentification, and where should I run what

backurne interacts with the backup cluster via the rbd command line. It must have the required configuration at /etc/ceph/ceph.conf and the needed keyring.
It is assumed that backurne will be run on a Ceph node (perhaps a monitor), but this is not strictly required (those communications will not be encrypted nor compressed).

backurne connects to proxmox's cluster via their HTTP API. No data is exchanged via this link, it is purely used for "control" (listing VM, listing disks, fetching informations etc).

backurne connects to every "live" Ceph clusters via SSH. For each cluster, it will connect to a single node, always the same, defined in Proxmox (and / or overwritten via the configuration).
SSH authentification nor authorization is not handled by backurne in any way.
It is up to you to configure ssh : either accept or ignore the host keys, place your public key on the required hosts etc.

Command line interface

See cli.md

REST API

See api.md

Used technology

  • RBD is the core technology used by backurne : it provides snapshot export, import, diff, mapping etc.
  • ssh is used to transfert the snapshots between the live clusters and the backup cluster. RBD can be manipulated over TCP/IP, but without encryption nor compression, thus this solution was not kept.
  • xxhash (or other, see the configuration) is used to check the consistancy between snapshots.
  • rbd-nbd is used to map a specific backup and inspect its content.
  • kpartx, qemu-img, qemu-nbd, vmfs-tools and vmfs6-tools are used for vmware exploration, ldmtool is used to map microsoft dynamic disks.

vmware support

The assumption is that the rbd image you back up is a single datastore. It contains multiple vmdk, each of them is a VM disk.
Datastores are a specific filesystem: VMFS. There is several version, as of today. You will need vmfs-tools to mount VMFS up to version 5. For version 6 support, vmfs6-tools is required.
When backurne detects a VMFS, it will try each version until success. If no vmfs*-tools is avaiable, the block device is left as is.
Once a VMFS device is mounted, each vmdk found inside will be mapped and mounted, recursively. In theory, you could have a VMFS, with inside a VM disk (vmdk), which is itself a datastore with inside more vmdk .. This behavior is not tested, though.\

LVM support

The same device may be seen at many layer by the device-mapped code.
To activate some LV, especially if the lives inside vmdk (see vmware support), you will need to tell LVM to allow such behavior.
By default, LVM refuses to activate LVs that shows up in multiple PVs.
To allow this, edit /etc/lvm/lvm.conf, and set allow_changes_with_duplicate_pvs to 1.\

Microsoft LDM support

Microsoft dynamic disks are supported. You will need the ldmtool to map those.
A single dynamic disk as well as a dynamic disk spread across multiple block devices (inside a VMFS for instance) are supported.
However, mapping multiple unrelated dynamic disk is not supported. For instance, if you map a backup A, and an unrelated backup B,
while both of them contains dynamic disks, the behavior is unexpected.\

"Bare-metal" restore

Restoring a complete image is out of backurne's scope.
If you are using proxmox, you may first need to restore the configuration in /etc/pve/.
Any way, once you know the target rbd image name, you will have to :\

  • find the desired backup image, using backurne ls
  • find the desired backup snapshot, using backurne ls <image>
  • export and import the image, using rbd export <image> --snap <snap> - | ssh <ceph-host> rbd import - <dest-image>

Graph and reporting

alt text alt text

An ugly grafana dashboard is provided in graph/grafana-backurne.json, data has stored in an influxdb database.
It provides two informations:\

  • the number of backups currently running, using data from telegraf (both the script and the config shall be found in graph/telegraf/*).
  • the duration of each backup

Merge requests or ideas of improvement are most welcome here.

Note

On Proxmox, LXC is not yet supported. Only Qemu so far :/

The project is developed mainly for Debian Buster and Proxmox, and is used here on these technologies.
The "plain" feature as well as running backurne on other operating system is less tested, and may be less bug-proof.
Bug report, merge requests and feature requests are welcome : some stuff are not implemented simply because I do not need them, not because it cannot be done nor because I do not want to code them.