RockefellerArchiveCenter/fornax

Restructure Package service is difficult to clean up if it has not been completed successfully

Closed this issue · 1 comments

Is your feature request related to a problem? Please describe.

If the Restructure Package service does not fully complete processing a sip--such as if Apache has been restarted while it's in-process--setting the sip process status back to SIP.EXTRACTED and re-starting the service causes the SIP to fail.

This is because the first thing that happens in this service is BagIt validation, and then files are moved around and directories are added. If BagIt validation from the original bag-info file happens after changes to files or directories, it will fail.

Describe the solution you'd like

I think there are (at least) two possible solutions here:

  1. Move BagIt validation to the end of the Extract Package service
  2. Add a cleanup method to move files back to how they were originally

I'm not sure the second solution would work, because if we're moving files around we are likely changing information that will cause BagIt validation to fail. I think the main downside to the first solution is that we could possibly be just moving the flakiness to the Extract Package service, but I also think the cleanup in that service would be easier (if it's necessary).

OK, I think I agree with trying to proceed with the first approach. I'm going to look at this more closely to see where we might run into issues if we just run the first part of that routine again and validate once we're done.

Regardless, validating at the start of that routine doesn't make a great deal of sense, so I may add a check to the preceding service (ExtractPackageRoutine).