/server

Back-end services and application for the DOECode web app.

Primary LanguageJavaBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

NOTICE:

This repository has been archived and is no longer actively maintained. Please visit DOE CODE for more information, or for any questions, please contact doecoderepositories@osti.gov.

DOE CODE Web Application

Consists of the "back-end" services and JAX/RS API calls for DOE CODE, to be accessed by the front-end or presentation layer. This application is targeted at a non-EE Java container such as Tomcat, using JPA and JAX/RS (Jersey implementation) for persistence layer and web API implementation.

Notes on building the application

The DOE CODE services API project is built using the maven build system, configured via the pom.xml in the project base folder. In order to facilitate shared configuration elements amongst this and other supporting applications, certain environment-specific options and settings have been migrated into a shared resources folder separate from each application.

This folder is assumed to be in the developer's "home" folder, under the path "shared-resources/doecode/". This folder should contain a standard Java properties file containing the desired configuration elements, customized to your specific environment(s) as needed. One might have a file for development, testing, and production, or as many other disparate configurations as dictated by your own systems.

The configuration elements are detailed below, and should be in these properties files with your personal values. In the maven build process, these properties are included according to the "environment" defined variable, which defaults to "development". Therefore, unless overridden, the $HOME/shared-resources/doecode/development.properties file should contain your default environmental settings.

The "environment" value may be altered by your own private profiles, or through the use of the command line switch -Denvironment=desired-value of maven at build time.

In order to activate and load "test.properties" from your shared-resources for example:

mvn -Denvironment=test package

Configuration Elements used

The application will run on most back-end Java EE platforms, tested specifically on Jetty and Tomcat. This assumes one already has a persistence store up-and-running.

These properties are loaded from the "environment.properties" configuration file and should correspond to environmental settings per your setup.

Parameter Definition
database.driver the JDBC database driver to use
database.url the JDBC URL to access
database.user the database user (with create/alter schema permission)
database.password the user's password
database.generation JPA database table command; usually "create-or-extend-tables" or "none"
serviceapi.host base URL for validation services
publishing.host base URL for submitting final metadata to OSTI (via /submit API)
datacite.user (optional) DataCite user account name for registering DOIs
datacite.password (optional) DataCite account password for DOI registration
datacite.baseurl (optional) DataCite base URL prefix to use for DOI registration
datacite.prefix (optional) DataCite registration DOI prefix value
datacite.url (optional) DataCite MDS URL for sending metadata
index.url (optional) URL to indexing service (e.g., SOLR, see below)
search.url (optional) base URL to searching service (SOLR, see below)
index.removal.url (optional) URL to indexing service for index removal (e.g., SOLR, see below)
site.url base URL of the client front-end services
email.host SMTP host name for sending confirmation emails
email.from the address to use for sending above emails
email.notification (optional) the address to use for sending notification emails when projects are submitted/announced
email.state.notification (optional) the address to use for sending notification emails when projects have a state change for: deleted/hidden/unhidden
github.user (optional) GitHub user account name for using GitHub API without access limitations
github.apikey (optional) the GitHub user's API key
file.uploads the server path used for saving uploaded files
file.containers the server path used for saving uploaded container images
file.containers.approved the server path used for storing approved uploaded container images
archiver.url (optional) base URL for DOE CODE Archiver API if using it for archiving
project.manager.name Display name for use in Project Manager emails.
project.manager.email (optional) Email address for BCC use when sending Project Manager emails.
account.reactivation.email (optional) Email address for CC use when sending Account Reset emails. Comma delimit for multiple addresses.

If optional parameters, such as the DataCite settings, are left blank, those features will not apply.

Execute the back-end via:

mvn jetty:run

or

mvn tomcat:run

as you prefer. Services by default will be available on localhost port 8080.

Note that log4j assumes tomcat as a basis for its files; simply include the command line switch to override:

mvn -P *your-profile* -Dcatalina.base=$HOME -Denvironment=*your-environment* jetty:run

to have logs in $HOME/logs/doecode.log via log4j default configuration.

The value of ${database.driver} is org.apache.derby.jdbc.EmbeddedDriver for Derby. You may wish to have a specific configuration file in your shared-resources to facilitate this build and run process.

API services

GET /services/metadata/{ID}

Retrieves a specified Metadata by its unique ID value, in JSON format. Optionally, you may supply a URL parameter "format=yaml" to obtain the output in YAML format.

GET /services/metadata/autopopulate?repo={URL}

Calls the Connector services to attempt to scrape/auto-populate metadata information if possible by deriving the appropriate repository from the URL. Empty JSON is returned if the determination cannot be made or the project does not exist or is otherwise inaccessible. You may specify an additional URL parameter of "format=yaml" to obtain the output in YAML format.

POST /services/metadata

Store a given metadata JSON to the DOE CODE persistence layer in an incomplete or pending status. The resulting JSON information is returned as the JSON object "metadata", including the generated unique IDs as appropriate if the operation was successful. Record is placed in the "Saved" work flow.

POST /services/metadata/yaml

Takes in JSON format metadata information, and returns that information in the YAML format. Does not persist any data.

POST /services/metadata/publish

Store the metadata information to the DOE CODE persistence layer with a "Published" work flow. JSON is returned as with the "Saved" service above, and this record is marked as available to the DOE CODE search output services. If DataCite information has been configured, this step will attempt to register any DOI entered and update metadata information with DataCite.

POST /services/metadata/submit

Post the metadata to OSTI, attempt to register a DOI if possible, and persist the information on DOE CODE. If workflow validations pass, the JSON will be returned with appropriate unique identifier information and DOI values posted in the JSON object "metadata". Data is placed in "Published" state.

POST /services/validation

Send JSON detailing a set of award number values or DOIs to validate.

{
    "values": ["10.5072/2134", "10.5072/238923", ...],
    "validations": ["DOI"]
}

Each value will be checked, and JSON "errors" array returned. Each value of the array should correspond with the passed-in "values" items. If the position in the "errors" array is blank, that value may be assumed valid; otherwise, an error message will be returned.

{
    "errors": ["10.5072/2134 is not a valid DOI.", "", ...]
}

Configuring settings.xml

Database parameters are provided through the ~/.m2/settings.xml file. The following is a sample using the full (non-embedded) Derby Database:

<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
    <profiles>
        <profile>
            <id>doecode</id>
            <properties>
                <github.apikey>your-key-goes-here</github.apikey>
                <github.user>username</github.user>
                <!-- The following line configures the URL to the full Derby database that is running on the network. -->
                <database.url>jdbc:derby://localhost:1527/DOECode;create=true</database.url>
                <database.driver>org.apache.derby.jdbc.ClientDriver</database.driver>
                <database.user></database.user>
                <database.password></database.password>
                <database.dialect>org.hibernate.dialect.DerbyDialect</database.dialect>
                <database.schema>doecode</database.schema>
            </properties>
        </profile>
    </profiles>
</settings>

Creating a Derby Database in Eclipse

It is often useful to have a simple database for testing that is not your institutions fully deployed database. The following steps outline how to create such a database in Eclipse.

  1. Install Eclipse Data Platform from the Help->Install New Software Menu if you do not already have it. The full list of update sites is available at http://www.eclipse.org/datatools/downloads.php.
  2. Install Apache Derby (either by downloading it manually or installing it via a package manager).
  3. In Eclipse, open the "Database Development" perspective.
  4. Follow the Eclipse Documentation to create a Derby Connector, create a connection profile, and connect to Derby.

In step 4, be sure to select "Derby Client Driver" instead of "Derby Embedded Driver." DOE CODE is not currently configured to work with the Embedded driver.

Running on AWS

The DOE CODE server works well on AWS. For the default RHEL 7 instance, the server can be executed with a Derby database for storing using the following rough steps:

  1. Create the instance. Make sure your security group is configured to let the necessary ports through (normally 8080).
  2. SSH into the instance using your key. Issue the following commands to download and install prerequisites including Java, Git, and Derby.
sudo yum install git java-1.8.0* wget
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
wget ftp://mirror.reverse.net/pub/apache/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz
wget http://mirror.stjschools.org/public/apache//db/derby/db-derby-10.13.1.1/db-derby-10.13.1.1-bin.tar.gz
tar -xzvf apache-maven-3.5.0-bin.tar.gz
sudo mkdir /opt/Apache
sudo cp db-derby-10.13.1.1-bin.tar.gz /opt/Apache/
cd /opt/Apache/ 
sudo tar -xzvf db-derby-10.13.1.1-bin.tar.gz
  1. Checkout the server code
git clone https://github.com/doecode/server
  1. Use an editor to add the following line to your .bashrc file:
export DERBY_INSTALL=/opt/Apache/db-derby-10.13.1.1-bin
  1. Start Derby sudo /opt/Apache/db-derby-10.13.1.1-bin/bin/startNetworkServer &
  2. Edit your local Maven settings file to point it to Derby using the following content
<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
    <profiles>
        <profile>
            <id>doecode</id>
            <properties>
                <github.apikey>your-key-goes-here</github.apikey>
                <github.user>username</github.user>
                <database.url>jdbc:derby://localhost:1527/DOECode;create=true</database.url>
                <database.driver>org.apache.derby.jdbc.ClientDriver</database.driver>
                <database.user>toby</database.user>
                <database.password>keith</database.password>
                <database.dialect>org.hibernate.dialect.DerbyDialect</database.dialect>
                <database.schema>doecode</database.schema>
            </properties>
        </profile>
    </profiles>
</settings>
  1. Start the server in test mode
cd ~/server
~/apache-maven-3.5.0/bin/mvn -P doecode jetty:run

SOLR for Searching and Indexing (Dissemination)

If configured in the deployment profile, Apache SOLR may be used for an indexing and searching service for DOE CODE. Setting up SOLR distribution package version 6.6.0 as a stand-alone service as follows:

  1. Download the SOLR package from Apache
  2. Unpack to desired location on application server. Change to unpacked folder (e.g., solr-6.6.0) under the distribution.
  3. Start the standalone SOLR service on the desired port:
$ bin/solr start -p {port}
  1. Create a new SOLR core, for example purposes named "doecode":
$ bin/solr create -c doecode -p {port}
  1. Install customized schema.xml and solrconfig.xml files provided in the repository to replace the default values.
  2. Reload the SOLR core to pick up these changes, via curl:
$ curl http://localhost:{port}/solr/admin/cores?action=RELOAD\&core=doecode

SOLR should be ready to use with the back-end. Configure the ${index.url} and ${search.url} and ${index.removal.url} appropriately and redeploy/restart the back-end services. Any records POSTed to the /publish and /submit endpoints should automatically be indexed by the SOLR server.

${index.url} is usually of the form:

http://localhost:{port}/solr/doecode/update/json/docs?softCommit=true

in order to take advantage of SOLR's near-real-time index updates.

${search.url} should be configured to

http://localhost:{port}/solr/doecode/query 

in order to get JSON results back in expected formats for the dissemination/searching service.

${index.removal.url} should be configured to

http://localhost:{port}/solr/doecode/update?commit=true 

in order to send JSON index removal request.

These values assume that the DOE CODE back-end is deployed on the same server as the SOLR standalone service. If not, alter the URL host names and ports appropriately. In order to terminate the SOLR standalone server, issue the command:

$ bin/solr stop

In the SOLR distribution folder.