/btrfs-diff-go

Analyze differences between two BTRFS snapshots (like GNU diff for directories)

Primary LanguageGoGNU General Public License v3.0GPL-3.0

btrfs-diff-go

Analyze differences between two BTRFS snapshots (like GNU diff for directories).

It is a single GO script (package) of ~ 1000 lines of code (without blanks and comments), plus a main script (binary) of 190 loc.

Release Release Date
Go Report Card Go build (v1.14, v1.15, v1.16, v1.17) Go static analysis Test
License: GPL v3 Contributor Covenant Conventional Commits

USAGE

This is the output of btrfs-diff --help :


btrfs-diff-go - Analyse the differences between two related btrfs subvolumes.

USAGE

	btrfs-diff-go [OPTIONS] PARENT CHILD
		Analyse the difference between btrfs PARENT and CHILD.

	btrfs-diff-go [OPTIONS] -f|--file STREAM
		Analyse the differences from a STREAM file (output from 'btrfs send').

	btrfs-diff-go [ -h | --help ]
		Display help.

ARGUMENTS

	PARENT
		A btrfs subvolume that is the parent of the CHILD one.

	CHILD
		A btrfs subvolume that is the child of the PARENT one.

OPTIONS

	-h | --help
		Display help.

	-i | --info
		Be verbose.

	-d | --debug
		Be more verbose.

	-f | --file STREAM
		Use a STREAM file to get the btrfs operations.
		This stream file must have been generated by the command
		'btrfs send' (with or without the option --no-data).

	-t[changed] | --with-times[=changed]
		By defautl time modifications are ignored. With that option
		they will be taken into account. They are labelled as 'times'
		but if you also specify '=changed' they will be labelled
		'changed'.

	-p[changed] | --with-perms[=changed]
		By defautl permission modifications are ignored. With that option
		they will be taken into account. They are labelled as 'perms'
		but if you also specify '=changed' they will be labelled
		'changed'.

	-o[changed] | --with-own[=changed]
		By defautl ownership modifications are ignored. With that option
		they will be taken into account. They are labelled as 'own'
		but if you also specify '=changed' they will be labelled
		'changed'.

	-a[changed] | --with-attr[=changed]
		By defautl attribute modifications are ignored. With that option
		they will be taken into account. They are labelled as 'attr'
		but if you also specify '=changed' they will be labelled
		'changed'.

EXAMPLES

	Get the differences between two snapshots.
	$ btrfs-diff-go /backup/btrfs-sp/rootfs/2020-12-25_22h00m00.shutdown.safe \
		/backup/btrfs-sp/rootfs/2019-12-25_21h00m00.shutdown.safe

AUTHORS

	Originally written by: David Buckley
	Extended, fixed, and maintained by: Michael Bideau

REPORTING BUGS
	Report bugs to: <https://github.com/mbideau/btrfs-diff-go/issues>

COPYRIGHT

	Copyright © 2020-2021 Michael Bideau.
	License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
	This is free software: you are free to change and redistribute it.
	There is NO WARRANTY, to the extent permitted by law.

	Info: original license chosen by David Buckley was MIT, but it allows sublicensing, so I
	      chose to sublicense it to GPLv3+ to ensure code sharing

SEE ALSO

	Home page: <https://github.com/mbideau/btrfs-diff-go>

Installation

First, install the required dependencies (example for Debian / Ubuntu)

~> sudo apt install golang libbtrfs-dev

Using go install command

Use the convenient go install :

~> go install github.com/mbideau/btrfs-diff-go

That will create a binary named btrfs-diff-go in $GOPATH/bin.

Building from the sources

Clone the repository, run the build then install it

~> git clone -q https://github.com/mbideau/btrfs-diff-go.git
~> cd btrfs-diff-go
~> go build -v
~> go install

Copy the binary to /usr/local/bin to make it available system wide

And rename it to btrfs-diff, if you don't care about the implementation language.

~> [ "$GOPATH" != '' ] || GOPATH="$HOME/go"
~> sudo cp $GOPATH/bin/btrfs-diff-go /usr/local/bin/btrfs-diff
~> sudo chmod +x /usr/local/bin/btrfs-diff

Fast diff between BTRFS snapshots

Why ? Reason to be

The great advantage of having a COW filesystem with snapshoting like BTRFS is that producing the differences between two snapshots is almost instantaneous.

For example, you can get the differences between snap1 and snap2 with the following command :

~> sudo btrfs send --quiet --no-data -p snap1 snap2 | LC_ALL=C btrfs receive --quiet --dump > /tmp/btrfs.dump

Note that this dump is not really human readable. Moreover it contains operations, not differences. So it is not exactly what we are looking for. For example it might contains transient object informations, and multiple lines of unintuitive operations to reproduce a file state.

I wanted a differences file format like the one you have when doing diff -rq or git status --goort, in short: a human friendly one.

I looked at the prior art (see below), but nothing were satisfying enough, so I rolled my own diff utility (which produce the stream with btrfs send and then parse it).

Prior art analysis

As the time of writing this (i.e.: Dec. 2020), I have found 2 projects matching btrfs diff in Github and 0 in Gitlab.

  • btrfs-send-go [GO]
    The one that this project have extended, fixed and improved.
    The original author's version is raw, and have minor bugs, but does exactly the job.
    It is also not translatable (as-is).

  • btrfs-snapshots-diff [Python 2]
    It has a lot of issues (with link, but not only), and is Python 2, which is deprecated by now.
    No go.

  • btrfs-snapshots-diff [Python 3]
    A fork of the previous one, with a lot of issues fixed and in Python 3.
    Because it is written in Python, it means that if I want to run it in initram (I do) I will need to include the Python binary and the required dependencies. Too much for what I want.
    May be I could compile it with Cython, but I am not (yet) comfortable with that.

There is also the snapper utility that compares BTRFS snapshots, but it does so by mounting both snapshots and doing a "standard" diff on them (if my understanding is correct).

Finally I have found a lot of small Python script doing a BTRFS diff, but they were using a hacky way to do it (based on the find-new method), without being able to catch deletions.
They were better-than-nothing prior to btrfs send and btrfs receive, but they are obsolete since. Hence, I skipped all those.

So, I almost found what I wanted, after patching/fixing btrfs-send-go but I was not confident enough to trust it, and it still lacked the translation layer.

This is why I first decided to roll my own script, in POSIX shell, that have all the features I was looking for. See btrfs-diff-sh.

But after being happy with it, it was a little bit too slow, so I decided to go back to the go version (pun intented) and fix its bug and improve its user experience.

Here it is. Way faster than the shell version. Not measured yet (it's a feeling).

Features list

Cool features implemented :

  • can produce the raw diff from two snapshots or parse a raw stream file
  • produces an output close to the diff -rq utility and git status --goort
  • fast

Limits / flaws

It does the job, but have some limits.

It was not tested on huge dumps, so it might not perform well or reveal majors bugs.

Due to BTRFS implementation, some files appear as changed, when they are not (according to diff utility). I have absolutely no idea why BTRFS is acting like this… If someone can help me figures this out, I'll be glad.

Feedbacks wanted, PR/MR welcome ❤️

If you have any question or wants to share your uncovered case, please I be glad to answer and accept changes through Pull Request.

Developing

Do your changes, then, in the source directory, run :

~> go build -v

Algorithm explained

The program follows this process :

  • produce (with btrfs send syscall) or get a BRTFS file stream (from CLI arg)
  • parse this file in a binary mode
  • extract commands and their parameters (should match the line of btrfs receive --dump)
    • those commands are mapped with operation (i.e.: command 'delete' => operation 'delete')
    • commands are associated with paths, mostly only one, and two for rename operation
    • foreach command's path, we re-created the file tree with an object called 'node'
    • we maintain two trees: one for new files, one for old files
    • new files can have an original one, and old file can have a new version
  • after having prcessed all the commands, we flatten and analize the tree
  • foreach old file we produce the resulting change, then the same for each new file

Testing

And to be sure that the program is working in your environment, or that you have not broken anything while developing, you have to run the tests.

Then you can run the following command:

~> sh test.sh

It is very raw but it test already a lot of cases.

TODO

By order of priority :

  • create a screencast to show of the program
  • make the program translatable
  • create a Github action to automaticaly insert the help into the README from the exectuion of the command
  • create a Github action to generate a table of links to the sections at the top of the README
  • create an alpha release when the program would have received enough testing (and possibly real life conditions runs)

Authors and contributors

Authors

Originally written by: David Buckley

Extended, fixed, and maintained by: Michael Bideau

Contributors

With a lot of thanks to :

  • Mek101: very good crash-tester 😉

Copyright and License GPLv3

Copyright © 2020-2021 Michael Bideau [France]

All the btrfs-diff-go source codes (every file but README.md, CODE_OF_CONDUCT.md and LICENSE) are licensed under the GPLv3+ license.

btrfs-diff-go is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

btrfs-diff-go is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with btrfs-diff-go. If not, see https://www.gnu.org/licenses/.

Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

About this document

License: CC-BY-SA

License: CC BY-SA 4.0

Copyright © 2020-2021 Michael Bideau, France
This document is licensed under a Creative Commons Attribution 4.0 International License.

Author: Michael Bideau

Michael Bideau, France

Made with: Formiko and Vim, plus some helpers/linters

I started with formiko, then used vim with linters to help catching mistakes and badly written sentences: