PASS Deposit Services
Deposit Services are responsible for the transfer of custodial content and metadata from end users to repositories. End users transfer custody of their content to PASS by performing a submission through the HTML user interface, and Deposit Services subsequently transfers the custody of content to downstream repositories.
Deposit Services is deployed as "back-end" infrastructure. It has no user-facing elements. In particular, Deposit Services is unaware of the internal/external duality of resource URIs. This means that when looking at URIs in Deposit Services' logging output, some adjustment may be necessary for a developer or systems operator to retrieve the resource from their location in the network topology.
Configuration
The primary mechanism for configuring Deposit Services is through environment variables. This aligns with the patterns used in development and production infrastructure which rely on Docker and its approach to runtime configuration.
Production Configuration Variables
Environment Variable | Default Value | Description |
---|---|---|
ACTIVEMQ_BROKER_URI |
null |
the publicly-supported (i.e. official PASS) variable for configuring the JMS broker URL. used to compose the SPRING_ACTIVEMQ_BROKER_URL
|
DSPACE_HOST |
localhost | the IP address or host name of the server running the SWORD protocol version 2 endpoint |
DSPACE_PORT |
8181 | the TCP port exposing the SWORD protocol version 2 endpoint |
ES_HOST |
localhost | the IP address or host name of the Elastic Search index. |
ES_PORT |
9200 | the TCP port running the Elastic Search HTTP API. |
FCREPO_HOST |
localhost | the IP address or host name of the Fedora repository |
FCREPO_JMS_PORT |
61616 | the TCP port for the STOMP protocol. |
FCREPO_PORT |
8080 | the TCP port running the Fedora HTTP REST API. |
FTP_HOST |
localhost | the IP address or host name of the NIH FTP server |
FTP_PORT |
21 | the TCP control port of the NIH FTP server |
PASS_DEPOSIT_HTTP_AGENT |
pass-deposit/x.y.z | the value of the User-Agent header supplied on Deposit Services' HTTP requests. |
PASS_DEPOSIT_JOBS_CONCURRENCY |
2 | the number of Quartz jobs that may be run concurrently. |
PASS_DEPOSIT_JOBS_DEFAULT_INTERVAL_MS |
600000 | the amount of time, in milliseconds, that Quartz launches jobs. |
PASS_DEPOSIT_JOBS_DISABLED |
undefined | set this environment variable to true to disable all Quartz jobs. By default this environment variable is undefined for the production runtime. |
PASS_DEPOSIT_QUEUE_SUBMISSION_NAME |
submission | the name of the JMS queue that has messages pertaining to Submission resources (used by the JmsSubmissionProcessor ) |
PASS_DEPOSIT_QUEUE_DEPOSIT_NAME |
deposit | the name of the JMS queue that has messages pertaining to Deposit resources (used by the JmsDepositProcessor ) |
PASS_DEPOSIT_REPOSITORY_CONFIGURATION |
classpath:/repositories.json | points to a properties file containing the configuration for the transport of custodial content to remote repositories. Values must be Spring Resource URIs. See below for customizing the repository configuration values. |
PASS_DEPOSIT_TRANSPORT_SWORDV2_SLEEP_TIME_MS |
10000 | the number of milliseconds to wait between depositing a package using SWORD, and checking the SWORD statement for the deposit state |
PASS_DEPOSIT_WORKERS_CONCURRENCY |
4 | the number of Deposit Worker threads that can simultaneously run. |
PASS_ELASTICSEARCH_LIMIT |
100 | the maximum number of results returned in a single search response |
PASS_ELASTICSEARCH_URL |
http://${es.host:localhost}:${es.port:9200}/pass | the URL used to communicate with the Elastic search API. Normally this this variable does not need to be changed (see note below) |
PASS_FEDORA_BASEURL |
http://${fcrepo.host:localhost}:${fcrepo.port:8080}/fcrepo/rest/ | the URL used to communicate with the Fedora REST API. Normally this variable does not need to be changed (see note below) |
PASS_FEDORA_PASSWORD |
moo | the password used for Basic HTTP authentication to the Fedora REST API |
PASS_FEDORA_USER |
fedoraAdmin | the username used for Basic HTTP authentication to the Fedora REST API |
SPRING_ACTIVEMQ_BROKER_URL |
${activemq.broker.uri:tcp://${fcrepo.host:localhost}:${fcrepo.jms.port:61616}} | the internal variable for configuring the URI of the JMS broker |
SPRING_ACTIVEMQ_PASSWORD |
null |
Password to use when authenticating to the broker |
SPRING_ACTIVEMQ_USER |
null |
User name to use when authenticating to the broker |
SPRING_JMS_LISTENER_CONCURRENCY |
4 | the number of JMS messages that can be processed simultaneously by each JMS queue |
If the Fedora repository is deployed under a webapp context other than
/fcrepo
, or ifhttps
ought to be used instead ofhttp
, the environment variablePASS_FEDORA_BASEURL
must be set to the base of the Fedora REST API (e.g.PASS_FEDORA_BASEURL=https://fcrepo:8080/rest
)
If the Elastic Search index is deployed under a url other than
/pass
, or ifhttps
ought to be used instead ofhttp
, the environment variablePASS_ELASTICSEARCH_URL
must be set to the base of the Elastic Search HTTP API (e.g.PASS_ELASTICSEARCH_URL=https://localhost:9200/index
)
Repositories Configuration
The Repository configuration contains the parameters used for connecting and depositing custodial material to downstream repositories. The format of the configuration file is JSON, defining multiple downstream repositories in a single file.
Each repository configuration has a top-level key that is used to identify a particular configuration. Importantly, each top-level key must map to a Repository
resource within the PASS repository. This implies that the top-level keys in repositories.json
are not arbitrary. In fact, the top level key must be one of:
- the value of a
Repository.repositoryKey
field (of aRepository
resource in the PASS repository) - the full URI of a
Repository
resource in the PASS repository - a portion of the URI path of a
Repository
resource in the PASS repository
Given a Repository
with a repositoryKey
of my-repo
and a URI of https://pass.my.edu/fcrepo/rest/repositories/77/cc/80/64/77cc8064-a918-4823-968d-2b17386db76d
, any of the following top level keys are acceptable:
my-repo
https://pass.my.edu/fcrepo/rest/repositories/77/cc/80/64/77cc8064-a918-4823-968d-2b17386db76d
/repositories/77/cc/80/64/77cc8064-a918-4823-968d-2b17386db76d
77cc8064-a918-4823-968d-2b17386db76d
Deposit Services comes with a default repository configuration, but a production environment will want to override the default. Defaults are overridden by creating a copy of the default configuration, editing it to suit, and setting PASS_DEPOSIT_REPOSITORY_CONFIGURATION
to point to the new location.
Acceptable values for
PASS_DEPOSIT_REPOSITORY_CONFIGURATION
must be a form of Spring Resource URI.
The default configuration is replicated below:
{
"JScholarship": {
"deposit-config": {
"processing": {
"beanName" : "org.dataconservancy.pass.deposit.messaging.status.DefaultDepositStatusProcessor"
},
"mapping": {
"http://dspace.org/state/archived": "accepted",
"http://dspace.org/state/withdrawn": "rejected",
"default-mapping": "submitted"
}
},
"assembler": {
"specification": "http://purl.org/net/sword/package/METSDSpaceSIP"
},
"transport-config": {
"auth-realms": [
{
"mech": "basic",
"username": "user",
"password": "pass",
"url": "https://jscholarship.library.jhu.edu/"
},
{
"mech": "basic",
"username": "user",
"password": "pass",
"url": "https://dspace-prod.mse.jhu.edu:8080/"
},
{
"mech": "basic",
"username": "dspace-admin@oapass.org",
"password": "foobar",
"url": "http://${dspace.host}:${dspace.port}/swordv2"
}
],
"protocol-binding": {
"protocol": "SWORDv2",
"username": "dspace-admin@oapass.org",
"password": "foobar",
"server-fqdn": "${dspace.host}",
"server-port": "${dspace.port}",
"service-doc": "http://${dspace.host}:${dspace.port}/swordv2/servicedocument",
"default-collection": "http://${dspace.host}:${dspace.port}/swordv2/collection/123456789/2",
"on-behalf-of": null,
"deposit-receipt": true,
"user-agent": "pass-deposit/x.y.z"
}
}
},
"PubMed Central": {
"deposit-config": {
"processing": {
},
"mapping": {
"INFO": "accepted",
"ERROR": "rejected",
"WARN": "rejected",
"default-mapping": "submitted"
}
},
"assembler": {
"specification": "nihms-native-2017-07"
},
"transport-config": {
"protocol-binding": {
"protocol": "ftp",
"username": "nihmsftpuser",
"password": "nihmsftppass",
"server-fqdn": "${ftp.host}",
"server-port": "${ftp.port}",
"data-type": "binary",
"transfer-mode": "stream",
"use-pasv": true,
"default-directory": "/logs/upload/%s"
}
}
}
}
Customizing Repository Configuration Elements
The default repository configuration will not be suitable for production. A production deployment needs to provide updated authentication credentials and insure the correct value for the default SWORD collection URL - default-collection
. Each transport-config
section should be reviewed for correctness, paying special attention to protocol-binding
and auth-realm
blocks: update username
and password
elements, and insure correct values for URLs.
Values may be parameterized by any property or environment variable.
To create your own configuration, copy and paste the default configuration into an empty file and modify the JSON as described above. The configuration must be referenced by the pass.deposit.repository.configuration
property, or is environment equivalent PASS_DEPOSIT_REPOSITORY_CONFIGURATION
. Allowed values are any Spring Resource path (e.g. classpath:/
, classpath*:
, file:
, http://
, https://
). For example, if your configuration is stored as a file in /etc/deposit-services.json
, then you would set the environment variable PASS_DEPOSIT_REPOSITORY_CONFIGURATION=file:/etc/deposit-services.json
prior to starting Deposit Services. Likewise, if you kept the configuration accessible at a URL, you could use PASS_DEPOSIT_REPOSITORY_CONFIGURATION=http://example.org/deposit-services.json
.
Failure Handling
A "failed" Deposit
or Submission
has Deposit.DepositStatus = FAILED
or Submission.AggregateDepositStatus = FAILED
. When a resource has been marked FAILED
, Deposit Services will ignore any messages relating to the resource when in listen
mode (see below for more information on modes). Intervention (automated or manual) is required to update the failed resource.
A resource will be considered as failed when errors occur during the processing of Submission
and Deposit
resources. Some errors may be caused by transient network issues, or a server being rebooted, but for now Deposit Services does not contain any logic for retrying when there are low-level communication errors with an endpoint.
Submission
resources are failed when:
- Failure to build the Deposit Services model for a Submission
- There are no files attached to the Submission
- Any file attached to the Submission is missing a location URI (the URI used to retrieve the bytes of the file).
- An error occurs saving the state of the
Submission
in the repository (arguably a transient error, but DS does not perform any retries when there are errors communicating with the repository)
See SubmissionProcessor
for details. Right now, when a Submission
is failed, manual intervention is required. Deposit Services does not provide any support for dealing with failed submissions. It is likely the end-user will need to re-create the submission in the user interface, and resubmit it.
Deposit
resources are failed when:
- An error occurs building a package
- An error occurs streaming a package to a
Repository
(arguably transient) - An error occurs polling (arguably transient, but DS does not perform retries) or parsing the status of a
Deposit
- An error occurs saving the state of a
Deposit
in the repository (again, arguably transient, but DS doesn't perform retries when there are errors communicating with the repository)
See DepositTask
for details. Deposits fail for transient reasons; a server being down, an interruption in network communication, or invalid credentials for the downstream repository are just a few examples. Manual intervention is required to remediate failed deposits, but Deposit Services provides support for this case (see the retry
mode documented below).
Build and Deployment
Deposit Services' primary artifact is a single self-executing jar. The behavior, or "mode" of the deposit services application is directed by command line arguments and influenced by environment variables. In the PASS infrastructure, the Deposit Services self-executing jar is deployed inside of a simple Docker container.
Deposit Services can be built by running:
mvn clean install
The main Deposit Services deployment artifact is found in deposit-messaging/target/deposit-messaging-<version>.jar
. It is this jarfile that is included in the Docker image for Deposit Services, and posted on the GitHub Release page.
Supported modes
The mode is a required command-line argument which directs the deposit services application to take a specific action.
Listen
Listen mode is the "primary" mode, if you will, of Deposit Services. In listen
mode Deposit Services responds to JMS messages from the Fedora repository by creating and transferring packages of custodial content to remote repositories.
Listen mode is invoked by starting Deposit services with listen
as the single command-line argument:
$ java -jar deposit-services.jar listen
Deposit Services will connect to a JMS broker specified by the SPRING_ACTIVEMQ_BROKER_URL
environment variable (optionally authenticating if SPRING_ACTIVEMQ_USER
and SPRING_ACTIVEMQ_PASSWORD
are present), and wait for the Fedora repository to be available as specified by FCREPO_HOST
and FCREPO_PORT
. Notably, listen
mode does not use the index.
If the Fedora repository is deployed under a webapp context other than
/fcrepo
, the environment variablePASS_FEDORA_BASEURL
must be set to the base of the Fedora REST API (e.g.PASS_FEDORA_BASEURL=http://fcrepo:8080/fcrepo/rest
)
After successfully connecting to the JMS broker and the Fedora repository, deposit services will listen and respond to JMS messages relating to the submission and deposit of material to the Fedora repository. Incoming Submission
resources created by end-users of the UI will be processed:
- custodial content packaged
- packages sent to destination repositories
- confirmation of custody transfer
- recording the identities of content in destination repositories
Incoming Deposit
resources will be used to update the overall success or failure of a Submission
.
Retry
Retry mode is used to retry a Deposit
that has failed. Retry mode is invoked by starting Deposit services with retry
as the first command-line argument, with an optional --uris
argument, accepting a space-separated list of Deposit
URIs to retry. If no --uris
argument is present, the index is searched for all Deposit
resources that have failed, and those are the deposits that are re-tried.
To retry all failed deposits:
$ java -jar deposit-services.jar retry
To retry specific deposits:
$ java -jar deposit-services.jar retry --uri=http://192.168.99.100:8080/fcrepo/rest/deposits/8e/af/ac/a9/8eafaca9-1f24-413a-bf1e-fbbd673ba45b --uri=http://192.168.99.100:8080/fcrepo/rest/deposits/4a/cb/04/bb/4acb04bb-4f79-40ef-8ff9-e105261aa7fb
Refresh
Refresh mode is used to re-process a Deposit in the SUBMITTED
state that needs its deposit status refreshed. When refresh
is invoked, the optional --uris
argument is used to identify the Deposit
resources to refresh. Otherwise a search of the index is performed for all Deposit
resources in the SUBMITTED
state.
Refreshing a Deposit
means that its deposit status reference will be retrieved, parsed, and processed. The status returned from the reference will be stored on the Deposit
, and the status of the corresponding RepositoryCopy
will be updated as well. If the Deposit
status is updated to ACCEPTED
, the RepositoryCopy
will be updated to COMPLETE
. If the Deposit
status is updated to REJECTED
, the RepositoryCopy
will be updated to REJECTED
as well.
To refresh all deposits in the SUBMITTED
state:
$ java -jar deposit-services.jar refresh
To refresh specific deposits:
$ java -jar deposit-services.jar refresh --uri=http://192.168.99.100:8080/fcrepo/rest/deposits/8e/af/ac/a9/8eafaca9-1f24-413a-bf1e-fbbd673ba45b --uri=http://192.168.99.100:8080/fcrepo/rest/deposits/4a/cb/04/bb/4acb04bb-4f79-40ef-8ff9-e105261aa7fb
Future modes
Modes to be supported by future releases of Deposit Services.
Report
TODO
Developers
Deposit Services is implemented using Spring Boot, which heavily relies on Spring-based annotations and conventions to create and populate a Spring ApplicationContext
, arguably the most important object managed by the Spring runtime. Unfortunately, if you aren't familiar with Spring or its conventions, it can make the code harder to understand.
The entrypoint into the deposit services is the DepositApp
, which accepts command line parameters that set the "mode" of the deposit services runtime. Spring beans are created entirely in Java code by the DepositConfig
and JmsConfig
classes.
Runners
ListenerRunner
The listen
argument will invoke the ListenerRunner
, which waits for the Fedora repository to be available, otherwise it shuts down the application. Two JMS listeners are started that listen to the submission
and deposit
queues. The submission
queue provides messages relating to Submission
resources, and the deposit
queue provides messages relating to Deposit
resources. Deposit Services does not listen or act on messages for other types of repository resources.
The PASS_FEDORA_USERNAME
and PASS_FEDORA_PASSWORD
define the username and password used to perform HTTP Basic
authentication to the Fedora HTTP REST API (i.e. PASS_FEDORA_BASEURL
).
FailedDepositRunner
The retry
argument invokes the FailedDepositRunner
which will re-submit failed Deposit
resources to the task queue for processing. URIs for specific Deposits may be specified, otherwise the index is searched for failed Deposits, and each one will be re-tried.
SubmittedUpdateRunner
The refresh
argument invokes the SubmittedUpdateRunner
which will attempt to re-process a Deposit
's status reference. URIs for specific Deposits may be specified, otherwise the index is searched for SUBMITTED
Deposits, and each one will be refreshed.
Message flow and concurrency
Each JMS listener (one each for the deposit
and submission
queues) can process messages concurrently. The number of messages each listener can process concurrently is set by the property spring.jms.listener.concurrency
(or its environment equivalent: SPRING_JMS_LISTENER_CONCURRENCY
).
The submission
queue is processed by the JmsSubmissionProcessor
,which resolves the Submission
resource represented in the message, and hands off processing to the SubmissionProcessor
. The SubmissionProcessor
builds a DepositSubmission
, which is the Deposit Services' analog of a Submission
containing all of the metadata and custodial content associated with a Submission
. After building the DepositSubmission
, the processor creates a DepositTask
and hands off the actual packaging and transfer of submission content to the deposit worker thread pool. Importantly, the SubmissionProcessor
updates the Submission
resource in the repository as being in progress.
There is a thread pool of so-called "deposit workers" that perform the actual packaging and transport of custodial content to downstream repositories. The size of the worker pool is determined by the property pass.deposit.workers.concurrency
(or its environment equivalent: PASS_DEPOSIT_WORKERS_CONCURRENCY
). The deposit worker pool accepts instances of DepositTask
, which contains the primary logic for packaging, streaming, and verifying the transfer of content from the PASS repository to downstream repositories. The DepositTask
will determine whether or not the transfer of custodial content has succeed, failed, or is indeterminable (i.e. an asyc deposit process that has not yet concluded). The status of the Deposit
resource associated with the Submission
will be updated accordingly.
Common Abstractions and Patterns
Failure Handling
Certain Spring sub-systems like Spring MVC, or Spring Messaging, support the notion of a "global" ErrorHandler
. Deposit services provides an implementation DepositServicesErrorHandler
, and it is used to catch exceptions thrown by the JmsDepositProcessor
, JmsSubmissionProcessor
, and is adapted as a Thread.UncaughtExceptionHandler
and as a RejectedExecutionHandler
.
Deposit services provides a DepositServicesRuntimeException
(DSRE
for short), which has a field PassEntity resource
. If the DepositServicesErrorHandler
catches a DSRE
with a non-null
resource, the error handler will test the type of the resource, mark it as failed, and save it in the repository.
The take-home point is: Deposit
and Submission
resources will be marked as failed if a DepositServicesRuntimeException
is thrown from one of the JMS processors, or from the DepositTask
. As a developer, if an exceptional condition does not warrant a failure, then do not throw DepositServicesRuntimeException
. Instead, consider logging a warning or throwing a DSRE
with a null
resource. Likewise, to fail a resource, all you need to do is throw a DSRE
with a non-null
resource. The DepositServicesErrorHandler
will do the rest.
Finally, one last word. Because the state of a resource can be modified at any time by any actor in the PASS infrastructure, the DepositServicesErrorHandler
encapsulates the act of saving the failed state of a resource within a CRI
. A pre-condition for updating the resource is that it must not be in a terminal state. For example, if the error handler is updating the state from SUBMITTED
to FAILED
, but another actor has modified the state of the resource to REJECTED
in the interim, the pre-condition will fail. It makes no sense to modify the state of a resource after it is in its terminal state. The take-home point is: the DepositServicesErrorHandler
will not mark a resource as failed if it is in a terminal state.
CriticalRepositoryInteraction
A central, yet awkwardly-named, abstraction is CriticalRepositoryInteraction
. This interface is used to prevent interleaved updates of individual repository resources by different threads. A CriticalRepositoryInteraction
(CRI
for short) isolates the execution of a Function
on a specific repository resource, and provides the boilerplate (i.e. template) for retrieving and updating the state of the resource. There are four main components to CriticalRepositoryInteraction
: the repository resource itself, a pre-condition, post-condition, and the critical update (i.e. the Function
to be executed). The only implementation of CRI
is the class CriticalPath
, and the particulars of that implementation are discussed below.
- First,
CriticalPath
obtains a lock over the string form of the URI of the resource being updated. This insures that any other threads executing aCRI
for the same resource in the same JVM must wait their turn before executing their critical update of the resource.
This occurs more often than one might think, as Deposit Services receives many messages for the same resource almost "all at once" when a submission occurs. The thread model for Spring and the Deposit Workers would be rife with conflicts unless something like the
CRI
was uniformly adopted in Deposit Services.
-
Second, the resource is read from the repository.
-
Third, the pre-condition
Predicate
is executed over the resource. If the pre-condition fails, the entireCriticalPath
is failed, and returns. -
Fourth, the critical
Function
is executed, assured that the resource at the time it was retrieved in step 2 meets the pre-condition applied in step 3. It is assumed that theFunction
modifies the state of the resource. TheFunction
may return the updated state of the resource, or it may return an entirely different object (remember theFunction
is parameterized by two types; while it must accept aPassEntity
, it does not have to return aPassEntity
). -
After updating the state of the resource in step 4, an attempt is made to store and re-read the updated resource in the repository. In this step, an
UpdateConflictException
may occur, because some other process outside of the JVM may have modified the resource after step 2 but before step 5. IfUpdateConflictException
is caught, it is the responsibility of theConflictHandler
to resolve the conflict. Otherwise, the update is successful, and processing of the resource by theCriticalPath
continues. -
Finally, the post-condition
BiPredicate
is executed. It accepts the resource as updated and read by step 5, and the object returned by the critical update in step 4. This determines the logical success or failure of theCriticalPath
. Steps 1 through 5 may have executed without error, but the post-condition has final say of the overall success of theCriticalPath
.
CriticalRepositoryInteraction Example
Here is a real example of a CRI
in action, used when packaging and depositing custodial content to a downstream repository.
The pre-condition insures that we are operating on Deposit
resources acceptable for processing. The critical update creates a package and streams it to the remote repository, and obtains a TransportResult
. The status of the Deposit
resource is modified, and the TransportResult
is returned by the critical update. Finally, the post-condition uses the state of the Deposit
resource and the TransportResult
to evaluate the success of the critical update.
Behind the scenes, the CriticalPath
is insuring that the state of the Deposit
is properly stored in the repository, and that any conflicts are handled.
After the CriticalPath
executes, its CriticalResult
can be examined for success or failure.
CriticalResult<TransportResponse, Deposit> result = critical.performCritical(dc.deposit().getId(), Deposit.class,
/*
* Pre-condition: only "dirty" deposits can be processed by {@code DepositTask}
*/
(deposit) -> {
boolean accept = dirtyDepositPolicy.accept(deposit.getDepositStatus());
if (!accept) {
LOG.debug(">>>> Update precondition failed for {}", deposit.getId());
}
return accept;
},
/*
* Post-conditon: determines *physical* success of the Deposit: were the bytes of the package successfully received?
* Note: uses the TransportResponse as well as the resource state to determine the success of this CRI
*/
(deposit, tr) -> {
boolean success = deposit.getDepositStatus() == SUBMITTED;
if (!success) {
LOG.debug(">>>> Update postcondition failed for {} - expected status '{}' but actual status " +
"is '{}'", deposit.getId(), SUBMITTED, deposit.getDepositStatus());
}
success &= tr.success();
if (!success) {
LOG.debug(">>>> Update postcondition failed for {} - transport of package to endpoint " +
"failed: {}", deposit.getId(), tr.error().getMessage(), tr.error());
}
return success;
},
/*
* Critical update: Assemble and stream a package of content to the repository endpoint, update status to SUBMITTED
* Note: this Function accepts a Deposit resource, but returns a TransportResponse. Both are used by the
* post-condition to determine the success of the CRI
*/
(deposit) -> {
Packager packager = dc.packager();
PackageStream packageStream = packager.getAssembler().assemble(dc.depositSubmission());
Map<String, String> packagerConfig = packager.getConfiguration();
try (TransportSession transport = packager.getTransport().open(packagerConfig)) {
TransportResponse tr = transport.send(packageStream, packagerConfig);
deposit.setDepositStatus(SUBMITTED);
return tr;
} catch (Exception e) {
throw new RuntimeException("Error closing transport session for deposit " +
dc.deposit().getId() + ": " + e.getMessage(), e);
}
});
Status
Deposit services primarily acts on three types of resources: Submission
, Deposit
, and RepositoryCopy
. Each of these resources carries a status. Managing and reacting to the values of resource status is a large part of what Deposit services does.
Abstractly, Deposit services considers the value of any status to be intermediate, or terminal.
It isn't clear, yet, whether this abstract notion of intermediate and terminal need to be shared amongst components of PASS. If so, then certain classes and interfaces in the Deposit Services code base should be extracted out into a shared component.
The semantics of terminal state are that the resource has been through a workflow of some kind, and has reached the end of that workflow. Because the workflow has reached a terminus, no additional state is expected to be placed on the resource, and no existing state of the resource is expected to change.
The semantics of intermediate state are that the resource is currently in a workflow of some kind, and has yet to reach the end of that workflow. Because the workflow has not reached a terminus, the resource is expected to be modified at any time, until the terminal state is achieved.
A general pattern within Deposit services is that resources with terminal status are explicitly accounted for (this is largely enforced by policies which are documented elsewhere), and are considered "read-only".
Submission Status
Submission status is enumerated in the AggregatedDepositStatus
class. Deposit services considers the following values:
NOT_STARTED
(intermediate): Incoming Submissions from the UI must have this status valueIN_PROGRESS
(intermediate): Deposit services places the Submission in anIN_PROGRESS
state right away. When a thread observes aSubmission
in this state, it assumes that another thread is processing this resource.FAILED
(intermediate): Occurs when a non-recoverable error happens when processing theSubmission
ACCEPTED
(terminal): Deposit services places the Submission into this state when all of itsDeposit
s have beenACCEPTED
REJECTED
(terminal): Deposit services places the Submission into this state when all of itsDeposit
s have beenREJECTED
Deposit Status
Deposit status is enumerated in the DepositStatus
class. Deposit services considers the following values:
SUBMITTED
(intermediate): the custodial content of theSubmission
has been successfully transferred to theDeposit
sRepository
ACCEPTED
(terminal): the custodial content of theSubmission
has been accessioned by theDeposit
sRepository
(i.e. custody of theSubmission
has successfully been transferred to the downstreamRepository
)REJECTED
(terminal): the custodial content of theSubmission
has been rejected by theDeposit
'sRepository
(i.e. the downstreamRepository
has refused to accept custody of theSubmission
content)FAILED
(intermediate): the transfer of custodial content to theRepository
failed, or there was some other error updating the status of theDeposit
RepositoryCopy Status
RepositoryCopy status is enumerated in the CopyStatus
class. Deposit services considers the following values:
COMPLETE
(terminal): a copy of the custodial content is available in theRepository
at this locationIN_PROGRESS
(intermediate): a copy of the custodial content is expected to be available in theRepository
at this location. The custodial content should not be expected to exist until theDeposit
status isACCEPTED
REJECTED
(terminal): the copy should be considered to be invalid. Even if the custodial content is made available at the location indicated by theRepositoryCopy
, it should not be mistaken for a successful transfer of custody.
RepositoryCopy status is subservient to the Deposit status. They will always be congruent. For example, a RepositoryCopy cannot be COMPLETE
if the Deposit is REJECTED
. If a Deposit is REJECTED
, then the RepositoryCopy must also be REJECTED
.
Common Permutations
There are some common permutations of these statuses that will be observed:
ACCEPTED
Submission
s will only haveDeposit
s that areACCEPTED
. EachDeposit
will have aCOMPLETE
RepositoryCopy
.REJECTED
Submission
s will only haveDeposit
s that areREJECTED
.REJECTED
Deposit
s will not have anyRepositoryCopy
at all.IN_PROGRESS
Submission
s may have zero or moreDeposit
s in any state.FAILED
Submission
s should have zeroDeposit
s.ACCEPTED
Deposit
s should have aCOMPLETE
RepositoryCopy
.REJECTED
Deposit
s will have aREJECTED
RepositoryCopy
SUBMITTED
Deposit
s will have anIN_PROGRESS
RepositoryCopy
FAILED
Deposit
s will have noRepositoryCopy