The purpose of this repository is to facilitate audit-proof archiving of files with bash scripts. "Audit-proof" in this case means that you can prove
- that the files in the archive weren't manipulated,
- who made the archive, and
- the exact time when the files were archived.
The scripts use well established Open Standards and reliable Free Software tools to accomplish this:
- SHA256 and SHA512 checksums * if a single file changes its checksum will fail
- GnuPG for OpenPGP signatures * if a checksum is changed the OpenPGP signature will break (as will the time stamp)
- OpenSSL with RFC3161 compliant time stamping authority (TSA) servers * if the signature is changed the time stamp will become invalid
Hence the only way to get an apa archive to pass all tests after changing a file is by updating checksums, signature and time stamp, which would inevitably change the archive's date. Meaning, it's no longer possible to change the past.
I designed my own archive format to ensure i'm also able to verify the integrity of those archives by script.
I use the file extension *.apa.txz
for it, which already tells you that technically it's a tar archive with XZ compression.
An apa archive always includes another uncompresed tarball called data.tar
which contains the actual content.
It also always includes a file called checksums
, which provides the SHA256 and SHA512 checksums for each file in data.tar
,
and was clearsigned by GnuPG (meaning the signature is part of the file itself). This means that the tasks 1. (manipulation check)
and 2. (authentication of the archivist) are always possible. The structure of the checksums
file is very similar to InRelease
files of Debian package repositories.
If requested during archive generation, the archive also provides one or more files ending in *.tsr
, which are the time stamps for the checksums
file.
They enable you to prove the exact time when the archive was generated, and thereby the time when you were in possession of the archived documents.
This cryptographic chain makes it possible to reliably archive an arbitrary number of documents with a single digital signature and time stamp. That's a feature because many TSA services limit the number of time stamps you can request or charge for each one.
The scripts use a collection of tools that must already be installed and in path so they are found. If a needed tool is missing, the respective script should fail with an error explaining what it was missing.
Make sure you have these installed:
tar
pxz
curl
openssl
gpg2
sha256sum
sha512sum
You also need a private OpenPGP key for generating signatures and public OpenPGP keys for archives you want to verify.
There are three callable bash scripts, archive_auditproof.sh
, archive_timestamp.sh
and archive_verify.sh
. A fourth bash script file called _archive_functions.sh
is a collection of bash functions that are used in the scripts and is not supposed to be used by itself; think of it more as a library.
archive_auditproof.sh
and archive_timestamp.sh
create their own configuration file in ~/.config/bash_scripts_$USER/
so that you can change useful default settings without touching the scripts.
All scripts also show a usage message when you call them without any parameters, which is hopefully elaborate enough to understand what you should do.
This script generates archives from all files in a given directory, including hashes and OpenPGP signature. If you want you can also request time stamps from one or more of the configured TSA servers.
This script is capable of both generating a digital time stamp for a given file as well as validating a given time stamp. You can use it separately from archive_auditproof.sh
.
Finally, this script examines a given apa archive and verifies that checksums, OpenPGP signature and time stamps are valid.
I consider these scripts to be in beta status. Feel free to check them out, but i can not guarantee that they behave like you expect. However, i would appreciate feedback and bug reports to be able to improve them so that they can become useful for everyone.
One feature i would like to add is a central SQLite database to register all archives and make them searchable for specific files. The database should be able to store different types of content, beginning with files and e-mails. The latter would have more metadata to search for, like subject or sender.
Also, to be compliant with GDPR requirements, a method for purging single files from an archive must be possible, without breaking signatures or time stamps (likely by adding new signatures and timestamps but keeping the old ones for reference).
To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please use the issue tracker on GitHub.
Please note that all development happens in the develop
branch. Pull requests against the master
branch will be rejected, as it is reserved for the current stable release.
Copyright 2018 Meik Michalke meik.michalke@hhu.de
apa is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
apa is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with apa. If not, see http://www.gnu.org/licenses/.