The bagit-python library can be found here: https://github.com/LibraryOfCongress/bagit-python
I have taken bagit.py out of it and included it here in this project. To see an example of using bagit-python, I have created a couple of scripts you can run. In a terminal run each of the following commands.
# Create a folder called bag-in-place. We will bag it in the next step.
./prepare-test.sh
# Take a look inside the bag-in-place folder now. See what is inside.
# This script runs bagit-python on the bag-in-place folder and zips it.
./bag.sh
Take a look inside the bag-in-place
folder to see what bagit-python did.
Have a look inside bag.sh
to see details of what it does.
The following runs a python script in this folder called validate-bag.py
.
python validate-bag.py
If the bag-in-place
folder is there and unmodified after bagging, this should echo a message to say whether it is ok. Try modifying the README.adoc
inside bag-in-place/data
. Run the validate script again. It should tell you that the bag is not valid.
Have a look inside validate-bag.py
to see what it does. It’s a very simple Python script that uses bagit-python
to validate a bag.
With bagit-python
, the profile isn’t used to generate the bag-info.txt
. However, you can define one through the arguments to it. For example:
--bagit-profile-identifier 'https://cdn.jsdelivr.net/gh/ResearchObject/bagit-ro@0.2.20160422/profile.json'
You can take a look at that JSON file in a web browser. The idea with profiles is that you can declare what profile the bag meets. That URL gets added to bagit.txt
. As part of a workflow, you can then validate whether the bag meets the profile.
There is a Python BagIt profiles validator here: https://github.com/bagit-profiles/bagit-profiles-validator
Using bagit.py on the command line, you get a variety of options, to define the metadata which sill be saved in the manifest. In the example above, I set the --contact-name
to "Yvette". Similarly you can set the organization name and address, you can change the hash algorithm, and more.
usage: bagit.py [-h] [--processes PROCESSES] [--log LOG] [--quiet] [--validate] [--fast] [--completeness-only] [--sha3_256] [--sha3_224] [--shake_256] [--sha1] [--shake_128] [--sha3_512] [--sha512] [--sha224] [--sha384] [--md5] [--sha256] [--blake2s] [--blake2b] [--sha3_384] [--source-organization SOURCE_ORGANIZATION] [--organization-address ORGANIZATION_ADDRESS] [--contact-name CONTACT_NAME] [--contact-phone CONTACT_PHONE] [--contact-email CONTACT_EMAIL] [--external-description EXTERNAL_DESCRIPTION] [--external-identifier EXTERNAL_IDENTIFIER] [--bag-size BAG_SIZE] [--bag-group-identifier BAG_GROUP_IDENTIFIER] [--bag-count BAG_COUNT] [--internal-sender-identifier INTERNAL_SENDER_IDENTIFIER] [--internal-sender-description INTERNAL_SENDER_DESCRIPTION] [--bagit-profile-identifier BAGIT_PROFILE_IDENTIFIER] directory [directory ...]