/bagit-java

Java library and command line to support BagIt. A project of the Library of Congress. Note: project members may work on both official Library of Congress projects and non-LC projects.

Primary LanguageJavaOtherNOASSERTION

BagIt Library (BIL)

Build Status Travis-CI Build Status (Linux) Appveyor Build Status (Windows) CircleCI
Metrics Coverage Status Github Latest Release Downloads
Documentation License javadoc.io

Description

The BAGIT LIBRARY is a software library intended to support the creation, manipulation, and validation of bags. Its current version is 0.97. It is version aware with the earliest supported version being 0.93.

Requirements

  • Java 8
  • gradle (for development only)

Support

  1. The Digital Curation Google Group (https://groups.google.com/d/forum/digital-curation) is an open discussion list that reaches many of the contributors to and users of this open-source project
  2. If you have found a bug please create a new issue on the issues page
  3. To contact a developer at the Library of Congress please email repo-dev@listserv.loc.gov
  4. If you would like to contribute, please submit a pull request

Major differences between version 5 and 4.*

Command Line Interface

The 5.x versions do not include a command-line interface. Users who need a command-line utility can continue to use the latest 4.x release (download 4.12.3 or switch to an alternative implementation such as bagit-python or BagIt for Ruby.

Serialization

Starting with the 5.x versions bagit-java no longer supports directly serializing a bag to an archive file. The examples show how to implement a custom serializer for the zip and tar formats.

Fetching

The 5.x versions do not include a core fetch.txt implementation. If you need this functionality, the FetchHttpFileExample example demonstrates how you can implement this feature with your additional application and workflow requirements.

Internationalization

All logging and error messages have been put into a ResourceBundle. This allows for all the messages to be translated to multiple languages and automatically used during runtime. If you would like to contribute to translations please visit https://www.transifex.com/acdha/bagit-java/dashboard/

New Interfaces

The 5.x version is a complete rewrite of the bagit-java library which attempts to follow modern Java practices and will require some changes to existing code:

Examples of using the new bagit-java library

Create a bag from a folder using version 0.97
Path folder = Paths.get("FolderYouWantToBag");
StandardSupportedAlgorithms algorithm = StandardSupportedAlgorithms.MD5;
boolean includeHiddenFiles = false;
Bag bag = BagCreator.bagInPlace(folder, algorithm, includeHiddenFiles);
Read an existing bag (version 0.93 and higher)
Path rootDir = Paths.get("RootDirectoryOfExistingBag");
BagReader reader = new BagReader();
Bag bag = reader.read(rootDir);
Write a Bag object to disk
Path outputDir = Paths.get("WhereYouWantToWriteTheBagTo");
BagWriter.write(bag, outputDir); //where bag is a Bag object
Verify Complete
boolean ignoreHiddenFiles = true;
BagVerifier verifier = new BagVerifier();
verifier.isComplete(bag, ignoreHiddenFiles);
Verify Valid
boolean ignoreHiddenFiles = true;
BagVerifier verifier = new BagVerifier();
verifier.isValid(bag, ignoreHiddenFiles);
Quickly verify by payload-oxum
boolean ignoreHiddenFiles = true;

if(BagVerifier.canQuickVerify(bag)){
  BagVerifier.quicklyVerify(bag, ignoreHiddenFiles);
}
Add other checksum algorithms

You only need to implement 2 interfaces:

public class MyNewSupportedAlgorithm implements SupportedAlgorithm {
  @Override
  public String getMessageDigestName() {
    return "SHA3-256";
  }
  @Override
  public String getBagitName() {
    return "sha3256";
  }
}

public class MyNewNameMapping implements BagitAlgorithmNameToSupportedAlgorithmMapping {
  @Override
  public SupportedAlgorithm getMessageDigestName(String bagitAlgorithmName) {
    if("sha3256".equals(bagitAlgorithmName)){
      return new MyNewSupportedAlgorithm();
    }

    return StandardSupportedAlgorithms.valueOf(bagitAlgorithmName.toUpperCase());
  }
}

and then add the implemented BagitAlgorithmNameToSupportedAlgorithmMapping class to your BagReader or bagVerifier object before using their methods.

Check for potential problems

The BagIt format is extremely flexible and allows for some conditions which are technically allowed but should be avoided to minimize confusion and maximize portability. The BagLinter class allows you to easily check a bag for warnings:

Path rootDir = Paths.get("RootDirectoryOfExistingBag");
BagLinter linter = new BagLinter();
List<BagitWarning> warnings = linter.lintBag(rootDir, Collections.emptyList());

You can provide a list of specific warnings to ignore:

dependencycheckth rootDir = Paths.get("RootDirectoryOfExistingBag");
BagLinter linter = new BagLinter();
List<BagitWarning> warnings = linter.lintBag(rootDir, Arrays.asList(BagitWarning.OLD_BAGIT_VERSION);

Developing Bagit-Java

Bagit-Java uses Gradle for its build system. Check out the great documentation to learn more.

Running tests and code quality checks

Inside the bagit-java root directory, run gradle check.

Uploading to maven central
  1. Follow their guides
  2. http://central.sonatype.org/pages/releasing-the-deployment.html
  3. https://issues.sonatype.org/secure/Dashboard.jspa
  4. Once you have access, to create an office release and upload it you should specify the version by running gradle -Pversion=<VERSION> uploadArchives
  5. Don't forget to tag the repository!

Note if using with Eclipse

Simply run gradle eclipse and it will automatically create a eclipse project for you that you can import.

Roadmap for this library

  • Further refine reading and writing of bags version 0.93-0.97
  • Fix bugs/issues reported with new library (on going)
  • Translate to various languages (on going)