Simple Data Conservancy Package Ingest Service
This package ingest service is intended transfer the contents of file archives (i.e. “packages”) into an LDP linked data repository such as Fedora. It includes:
- A core library in Java for ingesting packages of various formats
- A Simple HTTP API
- An API-X extension for exposing deposit endpoints on repository containers.
Premise
An archive contains custodial content (i.e. packaged files), and possibly additional packaging-specific metadata. A profile defines how these are distinguished. For example, it can be presumed that all content of a simple zip or tar file is custodial content. BagIt defines custodial content as all files underneath a /data
directory, and specifies additional “tag files” which may describe the circumstances of creating a bag (its author, date, etc), checksums for files, etc.
The package ingest service creates a repository resource (an LDPR) from each file in the custodial content of a package.
Additional processing rules may apply for each supported profile which may enhance the contents of LDPRs (e.g. add metadata), or create additional LDPRs. For example, If the package relates its resources into an LDP containment or membership hierarchy, the packaging profile may provide a way to encode this information, if this information is not otherwise present within the resources in the package
The original package may be discarded, or may be kept as part of an audit trail, used for authorization, etc. based upon policy. At minimum, the package ingest service will provide a log of all events that occurred during ingest.
If ingesting a package succeeds, further interaction with the newly created resources may be performed as usual via Fedora’s LDP-based API.
Goals
- Accommodate arbitrarily large packages with stream-oriented processing
- Allow the use of using simple command-line tools to deposit and verify success/failure (e.g curl, grep, etc)
- Accommodate backend workflows and policies
- Support synchronous and asynchronous paradigms in exposed APIs
Workflow
- Produce a package. For example
- Zipping up a file system
- Export from a repository
- Generating resources by some local process (e.g. a desktop GUI, laboratory instrument, etc)
- Choose a container in the repository to deposit into (an LDPC, identified by its URI)
- No specific discovery mechanism is defined; it is presumed that a client can inspect repository resources and pick one to deposit into, or is given a URI for this purpose.
- Submit the package to the container.
- A new member resource will be created, and contents of package placed into it
- Follow the deposit results.
- An event stream indicates processing as it happens, and indicates success or failure
Quick start
A docker-compose file is provided in order to offer a way to quickly get the package ingest extension running in Docker for demonstration or evaluation purposes. It runs a API-X, Fedora, and package ingest extension docker images. See package-ingest docker for a description of the package ingest docker image, and how it is configured.
- Install docker and docker-compose. See the API-X demo instructions for how to install and verify docker and docker-compose
- Edit the
.env
file to set any environment variables you want (e.g. to change the defaults). This is optional, except for users ofdocker-machine
. Docker-machine users have to edt theAPIX_BASEURI
variable and change the host fromlocalhost
to the IP address of theirdocker-machine
instance. - Start the services via
docker-compose up -d
. Usedocker-compose down
to stop all containers and destroy all daya,docker-compose stop
merely to stop the containers.