gkiefer/backup2l

Portability -- Worth Pursuing or Discard?

Closed this issue · 1 comments

Greetings Gundolf,

I have been a big fan of backup2l for years. I use it on many of the Linux servers that I have administered over the past decade.

A few years ago, I was looking at some minor tweaks to contribute that would make backup2l work with MacOS and Red Hat / CentOS. I forked the repo and started tweaking and I realized that there were some basic portability issues that would best be addressed by abstracting away the name of the command-line tool (e.g., gsed instead of sed or gawk instead of awk) and moving some repeated code into new shell functions.

At some point I realized that I had added a lot of code to make the setup explicit, and also rewritten many lines to abstract away the command names. In other words, I had started to rewrite backup2l, retaining your style and code, mostly, but making it more portable to other *nix systems.

I also realized that if/when I finished, if I submitted a Pull Request of this magnitude, you would probably have a good laugh. It would be a lot of changes to review and the potential that something gets badly broken, no matter how carefully it is tested, is always there.

So, my question is whether I should continue in this direction and when finished and tested, submit a PR, or scrap this fork. It is not complete or tested, but it should be obvious from the diffs where I'm going. If you have a chance, take a look at the comparison : master...ferthalangur:dev_rbj-on-1.6

Thanks,

rob

Thank you, Rob, for your comments and your efforts to contribute to backup2l.

Actually, portability has been a design goal from the very beginning of the project (17 years ago), and since those early years, the tool has successfully been used (for what I know) on Windows (Cygwin), MacOS and different Linux distros.

I will be happy to integrate patches with improvements. However, to maintain the design goals - simplicity and robustness - we need to stick to (perhaps strict) rules:

  1. Any patch must have a clearly stated reason (Which bug does it fix? Which exact feature does it implement for which use case? Is that beneficial for all users?).

  2. Patch making the code longer must have a good reason. Each new line of code is potential sources for bugs or portability issues.

With your patch I basically see the following issues:

  1. It makes the code longer.

  2. Your patch introduces a lot of abstraction at the tool name level, but on the other hand counteracts the existing abstraction at functionality level, which is important for portability. For example, in the original code, all settings related to portability are concentrated in a few lines in one place

    backup2l/backup2l

    Lines 32 to 64 in 461f446

    # The following variable defines the commonly required tools. Additional tools
    # may be required by special functions, e. g. md5sum by the check functions.
    # Tool requirements are check in the do_* and show_* functions.
    COMMON_TOOLS="date find grep gzip gunzip sed awk mount umount"
    # The following variables define the format of *.list files. Modify at your own risk.
    FORMAT="%8s %TD %.8TT %04U.%04G %04m %p" # format for *.list files
    FILTER_NAME="sed 's#^\([-:. 0-9]*/\)\{3\}#/#'"
    # sed command for extracting names from .list, .new, ... files
    # (removes everything up to the 3rd "/"; two are contained in the time field)
    FILTER_CHOWN="sed 's#^\( *[^ ]\+\)\{3\} *0\{0,3\}\([0-9]\+\)\.0\{0,3\}\([0-9]\+\) \+\([0-9]*\)\{4\} \+/\(.*\)\$#\2:\3 \"\5\"#'"
    # sed command for extracting ownerships and names from .list files
    FILTER_CHMOD="sed 's#^\( *[^ ]\+\)\{4\} *\([0-9]\{4\}\) \+/\(.*\)\$#\2 \"\3\"#'"
    # sed command for extracting permissions and names from .list files
    FILTER_UNIFY_NAME="sed 's#\\\\[0-7]\{3\}#?#g; s#[^a-zA-Z0-9_ .$%:~/=+\#\-]#?#g'"
    # replaces special and escaped characters by '?';
    # only used when checking TOCs of fresh backup archives in order to avoid false alarms
    # The following alternative (donated by Christian Ludwig <Ludwig_C@gmx.de>) is slightly more precise,
    # but requires perl to be installed:
    #FILTER_UNIFY_NAME="perl -n -e 's{\\\\(\d{3})}{chr(oct(\$1))}eg; print;'"
    # On systems without GNU sed >= 3.0.2 such as Mac OS X, try the following alternatives...
    # ... settings (donated by Joe Auricchio <avarame@ml1.net>).
    #FILTER_CHOWN="perl -pe 's#^( *[^ ]+){3} *0{0,3}([0-9]+).0{0,3}([0-9]+) +([0-9]*){4} +/(.*)\$#\$2.\$3 \"\$5\"#'"
    #FILTER_CHMOD="perl -pe 's#^( *[^ ]+){4} *([0-9]{4}) +/(.*)\$#\$2 \"\$3\"#'"
    #COMMON_TOOLS="$COMMON_TOOLS perl"
    - with one line per setting. Your patch replaces, for example, the definition FILTER_UNIFY_NAME by three related definitions in lines 71, 77 and 214 (which is not complete, I assume, since the present code cannot work if line 214 is not executed).

  3. New functions like setup_tool (), setup_awk (), or optimize_if_perl_available () perform tasks that should be done statically, but not inside the main backup2l script.

A good contribution to the project could perhaps be a separate script like "backup2l-configure", which generates the 5-7 lines of configuration data (basically, the "FILTER_" macros) in accordance with the system.