keeps/commons-ip

Issue with Incorrect Checksum for representation METS in SIP Creation

JohannesKarlsen99 opened this issue · 2 comments

When creating a Submission Information Package (SIP) using the commons-ip library, I encountered an unexpected behavior regarding cheksums. Despite explicitly setting the checksum to be MD5 using the setChecksum method from the IP.java class, the resulting SIP contains a SHA-256 checksum for the representation METS.xml. I'm uncertain whether this behavior is intentional or if it indicates a potential bug within the commons-ip library.

Example fileSec:

<fileSec ID="uuid-C88FF056-B090-4244-BD3D-1CD98734D26D">
        <fileGrp ID="uuid-1B616075-4F38-42C0-AB06-7D50B3707320" USE="Schemas">
            <file ID="ID-D16EFBD6-65FF-4172-BBC9-CB0ABC7600AA" MIMETYPE="application/octet-stream" SIZE="2038" CREATED="2024-02-05T09:46:30.234+01:00" CHECKSUM="EB72EF8AB5B1C93801DFACBFE6AA8E27" CHECKSUMTYPE="MD5">
                <FLocat xlink:type="simple" xlink:href="schemas/DILCISExtensionMETS.xsd" LOCTYPE="URL"/>
            </file>
            <file ID="ID-23206CA9-C9B1-4A08-86B5-1C76CC0D1AF7" MIMETYPE="application/octet-stream" SIZE="499" CREATED="2024-02-05T09:46:30.241+01:00" CHECKSUM="83DA1FF6F35ADEECE3CCCFB5E2E9F83A" CHECKSUMTYPE="MD5">
                <FLocat xlink:type="simple" xlink:href="schemas/DILCISExtensionSIPMETS.xsd" LOCTYPE="URL"/>
            </file>
            <file ID="ID-86C50098-24A4-48FB-BFFF-E331EFB61DE8" MIMETYPE="application/octet-stream" SIZE="137125" CREATED="2024-02-05T09:46:30.247+01:00" CHECKSUM="0504DEDC1251E87D7E85F9FF2DBADC0D" CHECKSUMTYPE="MD5">
                <FLocat xlink:type="simple" xlink:href="schemas/mets1_12.xsd" LOCTYPE="URL"/>
            </file>
            <file ID="ID-99663FA8-1243-4EE4-BD3D-A058B5E4500A" MIMETYPE="application/octet-stream" SIZE="3180" CREATED="2024-02-05T09:46:30.252+01:00" CHECKSUM="6BDC7F9459A502964F889D70A335CECE" CHECKSUMTYPE="MD5">
                <FLocat xlink:type="simple" xlink:href="schemas/xlink.xsd" LOCTYPE="URL"/>
            </file>
        </fileGrp>
        <fileGrp ID="uuid-F11C5D3F-FF82-44D2-992D-D799C16F8803" USE="Representations/originals-001">
            <file ID="ID-94569F65-F870-4293-B668-B2155A262AA6" MIMETYPE="application/xml" SIZE="1199" CREATED="2024-02-05T09:47:30.792+01:00" CHECKSUM="0EF2DA26742DFD642192896A7FDC92C0267D23964848F25C26F0261035860550" CHECKSUMTYPE="SHA-256">
                <FLocat xlink:type="simple" xlink:href="representations/originals-001/METS.xml" LOCTYPE="URL"/>
            </file>
        </fileGrp>
    </fileSec>

There seems to be a few instances where the CHECKSUM_ALGORITHM (which defaults to SHA256) constant is used instead of the configured parameter. They should get it from the SIP instance.

We need this functionality in our business, and have therefore started making changes for internal use. Perhaps you can benefit from the changes we have made in our fork? We are happy to contribute with a pull request, but we do not have full control over the entire code base yet. This is a patch of the 2.5.0 version.

https://github.com/keeps/commons-ip/compare/2.5.0...NationalLibraryOfNorway:commons-ip:2.5.0-checksum-patch?expand=1