Version 1.1.4
Updated June 21, 2017
BagIt and bagbatch.py
Requirements and setup
– BagIt
– Java
– – – Setting up the JAVA_HOME environment variable
– – – – – – – Windows 7
– – – – – – – Mac
– Python
– – – Installation Details: Windows 7
– – – Installation Details: Mac
– Opening Command Prompt and Terminal
– – – Windows 7
– – – Mac
Using bagbatch.py
– Overview
– Usage
– Detailed Instructions
Setting up a new version of BagIt or changing the installation path
– Overview
– Detailed Instructions
Troubleshooting
– Java
– Python
– Bagit
– BagBatch
BagIt is a Java-based program that creates archival bags of files following the BagIt standard developed by the Library of Congress. It is used to store a directory's master copies, web copies, and metadeta or to verify that files have been transfered from one computer to another properly. By default, BagIt only processes one directory at a time. bagbatch.py was written to process an entire parent directory with folders ready for bagging on both Mac and Windows.
- Windows 7 or Mac OS
- BagIt
- Java
- Python 2.7
Follow the instructions below to setup the required programs. For additional information, check out Carlpedia - BagIt, which has links to tutorials and examples.
If you have an older or newer version of Windows or Mac, the directions below may not apply to your operating system.
BagIt can be downloaded from the Library of Congress on SourceForge. Adapted from the readme file for BagIt 4.9.0 (see the link above for the most recent information):
Update 5-9-2018. Please note that the Library of Congress no longer uses Bagit 4 and has moved to a new version of Bagit that is not compliant with our BagBatch script. In the Winter 2019 we will begin working on updating BagBatch to be complaint with the new version of Bagit, but in the meantime, BagIt 4.4 is still available at https://sourceforge.net/projects/loc-xferutils/ under the name "Library of Congress - Transfer Tools."
- Unzip bagit-java-master. From this new folder, unzip bagit-x.x.x-src.zip. In the resulting 'bagit-x.x.x/bin' subdirectory, you will find bag.sh (Mac) and bag.bat (Windows) scripts that are used to invoke the command line interface.
Java must be installed. To check if Java is installed, type java -version
in Terminal (Mac) or Command Prompt (Windows). If
Java is not installed, download it from java.com. Although the BagIt specifications recommend Java 6, bagbatch.py has also been tested with Java 7. If you run into any Java issues, check out the troubleshooting section.
The Java runtime environment (JAVA_HOME) must be set up correctly for BagIt to run. Follow the steps for your operating system below.
- Click Start/Control Panel. Enter into the search bar 'environment variable' and select Edit environment variables for your account. Or, to do it manually: Start/Control Panel/System and Security/System/Advanced system settings/Environment variables.
- In the Environment Variables window, under System variables, check that a variable named JAVA_HOME exists. If so, the variable is already set up. If not, continue with step 3.
- Keeping those windows open, click Start/Control Panel and search for 'Java'. If nothing is found, Java has not been installed correctly. In the Java Control Panel, select the Java tab and then view. Under path, copy all but the \bin\javaw.exe part at the end of the path. It should look similar to C:\Program Files\Java\jre7.
- Back in the Environment Variables window, create a new system variable with the name JAVA_HOME and with the copied path as the value. Click Ok
BagIt must already be installed.
- Open the terminal by going to Applications/Utilities/Terminal.
- Type
nano ~/bagit/bin/bag
, which will open a text editor. - Edit the line
#JAVA_HOME=
to equal the direct path to the version of Java you are running. For example, the line may look like:#JAVA_HOME=/usr/lib/jvm/java-6-sun
- Edit the line
MAXMEM=512m
and change it toMAXMEM=2048m
- Hit Control O and Return to save your changes and Control X to exit.
Python 2.7 must be installed to run bagbatch.py. Mac OS X, Mavericks and Mountian Lion have Python 2.7 already installed.
Enter python
into Terminal
or Command Prompt to see if Python is setup for terminal/command line use. If
not, make sure Python is installed (if it isn't, download version 2.7 from
python.org). Python 2.7 documentation for setting up the interpreter
Python must be able to run from Command Prompt or Terminal. The Python environment variable must be created, similar to how the JAVA_HOME variable was setup.
- Find the installation folder for Python 2.7, which is usually C:\Python27.
- Click Start/Control Panel. Enter into the search bar 'environment variable' and select Edit environment variables for your account. Or, to do it manually: Start/Control Panel/System and Security/System/Advanced system settings/Environment variables.
- In the Environment Variables window, under System variables, select the
Path
variable and click Edit. - To the end of the value field, add a semicolon if need be and add the location of the Python installation. If this is C:\Python27, add
C:\Python27;C:\Python27\Scripts
.
Start/All Programs/Accessories/Command Prompt
To see a list of folders, type dir
and press enter. This lists all the directories (folders) in the current directory. To navigate to one of those directories, enter cd FolderName
and use quotes around the folder name if the name contains spaces, such as cd "Folder Name"
. To navigate backwards, type cd ..
. More tutorials.
Applications/Utilities/Terminal
To see a list of folders, type ls
and press enter. This lists all the directories (folders) in the current directory. To navigate to one of those directories, enter cd FolderName
and use quotes around the folder name if the name contains spaces, such as cd "Folder Name"
. To navigate backwards, type cd ..
. More tutorials.
- Using Terminal or Command Prompt, navigate to the directory containing bagbatch.py
- Enter
python bagbatch.py <command>
orpython bagbatch.py <command> <dir>
where<dir>
is the parent directory containing the subdirectories to be bagged.
Commands are as follows:
- baginplace: Creates a bag-in-place. The source must be a directory on a filesystem and may already have a data directory.
- update: Updates the manifests and (if it exists) the bag-info.txt for a bag.
- verifyvalid: Verifies the validity of a bag.
Input: Parent directory with subdirectories to bag, verifyvalid, or update.
Output: Parent directory with bagged subdirectories.
Dependency: BAGIT_INST_PATH.txt
The first time bagbatch.py is run, it will create BAGIT_INST_PATH.txt and prompt for the path of the BagIt installation. Within this folder, there will be a folder named 'bin' containing bag.bat or bag.sh. This path may be "/Applications/bagit-X.X" or "C:\Program Files\bagit-X.X.X". The path will be saved in BAGIT_INST_PATH.txt. Each time bagbatch.py is run, it will validate this location.
Possible Terminal/Command Prompt inputs:
python bagbatch.py <command> <directory>
python bagbatch.py <command>
: Prompts for directory to run the command onpython <directory>
: Automatically uses thebaginplace
commandpython bagbatch.py
: Prints the usage instructionspython bagbatch.py help
orpython bagbatch.py version
: Prints the usage instructions
- Open Terminal or Command Prompt and navigate to the directory containing bagbatch.py. For example, if it's in a folder on the desktop, enter "cd Desktop/folder".
- Enter
python bagbatch.py <command>
using an accepted command. Press enter and the current status will be printed to the screen. The program will prompt for the folder containing the directories to run this command on. To prevent this popup, you can instead enterpython bagbatch.py <command> <dir>
where<dir>
is the directory that contains all the subdirectories that need to be bagged. Type the path in or drag-and-drop the folder from the Finder/Explorer menu. Press enter. If no command is enter, baginplace will be used. - If this is the first time bagbatch.py is run, you will be prompted for the BagIt installation directory. This folder name will look like
bagit-X.X.X
where the X.X.X is the version number. Within this folder, there must be another folder named 'bin' containing the files bag.sh (Mac) and bag.bat (Windows). - Depending on the size of the files, it may take a while to bag. The program is done once "Bags complete" is printed.
The BagIt installation path used by bagbatch.py must be manually deleted or changed.
- Option 1: Delete
BAGIT_INST_PATH.txt
, located in the same folder as bagbatch.py, and run bagbatch.py. It will prompt for the location of the new installation. - Option 2: Manually edit
BAGIT_INST_PATH.txt
to the location of the new installation
BAGIT_INST_PATH.txt
contains the path to BagIt's bag installation. This path
will look similar to "C:/Program Files/bagit-VERSION/bin"
or "/Program Files/ Archival/bagit-4.9.0/bin"
.
- Do NOT use
".../src/assembly/bin"
. Use".../bagit-x.x.x-bin/bagit-x.x.x/bin"
instead.
Each time bagbatch.py runs, it will validate this path, checking to see that the directory and bag.sh (Mac) or bag.bat (Windows) exists. If the directory does not exists or it does not contain bag.sh or bag.bat, it will prompt for the correct path.
If a new
version of BagIt is installed and the old version is not removed,
BAGIT_INST_PATH.txt
must be manually edited. Either delete the file and run
bagbatch.py and let it prompt for the installation directory, entering the new
installation this time, or enter in the path to the new installation.
If you are getting errors in Terminal or Command Prompt such as
Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
First, try closing any open programs except Terminal/Command Prompt. If that doesn't work, restart your computer and try running bagbatch.py again. If the problem persists, follow these directions (Mac, Windows) to increase the amount of memory Java can access for BagIt.
- Is Java (version 6 or 7) installed? BagIt recommends Java 6, and BagBatch has been tested with Java 7.
- Is Java correctly setup for Terminal or Command Prompt usage? Enter
java -version
in Terminal or Command Prompt to check the version. - Is the JAVA_HOME environment variable setup correctly? Follow these setup instructions.
- If that does not work, try uninstalling and reinstalling Java, following Java's download instructions.
- Is Python 2.7 installed?
- Is Python correctly setup for Terminal or Command Prompt usage? Enter
python
in Terminal or Command Prompt to check the version. - For Windows, check to see if the Python environment variable is setup correctly.
- If that does not work, try uninstalling and reinstalling Python 2.7, following Python's download instructions.
Download BagIt 4.x.x from the Library of Congress on GitHub.
- Is BagIt downloaded?
- Which version of BagIt? BagBatch has been tested with BagIt 4.4 and BagIt 4.9.0.
- Check that the downloaded folder for BagIt contains the folder
bin
. Inbin
, there must be a file calledbag.sh
(Mac) orbag.bat
(Windows).
- Check your version of BagBatch by entering
python bagbatch.py
.