RockefellerArchiveCenter/fornax

Split up SIPAssembler service

Closed this issue · 0 comments

Is your feature request related to a problem? Please describe.

The SIPAssembler does many things, so it is resource-intensive and has many points of failure. This makes it particularly difficult with large bags--which are more likely to cause issues and are difficult to troubleshoot.

Describe the solution you'd like

Split up the SIP Assembler into at least three services, the first of which isolates extracting the tarfile. The second service should validate the bag and create the structure and metadata files. The third service should move the files to the destination path.

Additional context

I think it makes sense to isolate validating the bag from extracting the tarfile, since this is the point at which large bags are causing issues.