/jgoslin

Java implementation of the latest shorthand nomenclature.

Primary LanguageJavaApache License 2.0Apache-2.0

jgoslin Parser, Validator and Normalized for Shorthand Lipid Nomenclatures

Latest Release DOI Build Status Quality Gate

This project is a parser, validator and normalizer implementation for shorthand lipid nomenclatures, base on the Grammar of Succinct Lipid Nomenclatures project.

Goslin defines multiple grammers compatible with ANTLRv4 for different sources of shorthand lipid nomenclature. This allows to generate parsers based on the defined grammars, which provide immediate feedback whether a processed lipid shorthand notation string is compliant with a particular grammar, or not.

jGoslin uses the Goslin grammars and the generated parser to support the following general tasks:

  1. Facilitate the parsing of shorthand lipid names dialects.

  2. Provide a structural representation of the shorthand lipid after parsing.

  3. Use the structural representation to generate normalized names.

The Maven site with JavaDoc is available here.

The web-based application is available here.

Table of contents

Building the project and generating client code from the command-line

JGoslin uses the latest Long Term Support Java release version (17). In order to build the client code and run the unit tests, execute the following command from a terminal:

./mvnw install

This generates the necessary domain specific code for Java.

Running a validation with the command-line interface

The cli sub-project provides a command line interface for parsing of lipid names either from the command line or from a file with one lipid name per line.

After building the project as mentioned above with ./mvnw install, the cli/target folder will contain the jgoslin-cli-<version>-bin.zip file. Alternatively, you can download the latest cli zip file from JFrog: Search for latest jgoslin-cli-<VERSION>-bin.zip artefact and click to download.

In order to run the validator, unzip that file, change into the unzipped folder and run

java -jar jgoslin-cli-<VERSION>.jar

to see the available options.

To parse a single lipid name from the command line, run

java -jar jgoslin-cli-<VERSION>.jar -n "Cer(d31:1/20:1)"

To parse multiple lipid names from a file via the commmand line, run

java -jar jgoslin-cli-<VERSION>.jar -f examples/lipidnames.txt

To use a specific grammar, instead of trying all, run

java -jar jgoslin-cli-<VERSION>.jar -f examples/lipidNames.txt -g GOSLIN

To write output to the tab-separated output file 'goslin-out.tsv', run

java -jar jgoslin-cli-<VERSION>.jar -f examples/lipidNames.txt -o

Running the Web Application for Validation

The goslin web application is available at: https://apps.lifs-tools.org/goslin

Building the Docker Image

In order to build a Docker image of the command line interface application, run

./mvnw -Pdocker install

from your commandline (mvnw.bat on Windows). This will build and tag a Docker image lifs/jgoslin-cli with a corresponding version and make it available to your local Docker installation. To show the coordinates of the image, call

docker image ls | grep "jgoslin-cli"

Running the Docker Image

If you have not done so, please build the Docker image of the validator cli or pull it from the docker hub (see previous sections). Then, run the following command, replacing <VERSION> with the current version, e.g. 1.0.0) and <DATA_DIR> with the local directory containing your lipid name files:

docker run -v <YOUR_DATA_DIR>:/home/data:rw lifs/jgoslin-cli:<VERSION>

This will only invoke the default entrypoint of the container, which is a shell script wrapper calling the jgoslin-cli Jar. It passes all arguments to the validator, so that all arguments that you would pass normally will work in the same way (please replace <YOUR_FILE> with the actual file’s name in <YOUR_DATA_DIR>:

docker run -v <YOUR_DATA_DIR>:/home/data:rw lifs/jgoslin-cli:<VERSION> -f <YOUR_FILE>

You can also run the docker container without the -f <YOUR_FILE> argument to see a list of possible arguments.

Using the project code releases from Maven Central

jgoslin is available from the Maven central repository:

To use the parser libraries (reading and validation) in your own Maven projects, use the following dependency:

<dependency>
    <groupId>org.lifs-tools</groupId>
    <artifactId>jgoslin-parsers</artifactId>
    <version>${jgoslin.version}</version>
</dependency>

where jgoslin.version is the version of jgoslin you wish to use, e.g. for a release version:

<properties>
  <jgoslin.version>2.0.0</jgoslin.version>
</properties>

as defined in the properties section of your pom file.

Using the project code releases via JFrog

The library release artifacts are available from JFrog. If you want to use them, add the following lines to your own Maven pom file :

<profile>
  <id>lifs-repos</id>
  <repositories>
   <repository>
       <snapshots>
           <enabled>false</enabled>
       </snapshots>
       <id>lifs-libs-release</id>
       <name>lifs-libs-release</name>
       <url>https://lifstools.jfrog.io/artifactory/lifs-libs-release</url>
   </repository>
  </repositories>
</profile>

To compile jgoslin against the LIFS JFrog repository, please add the following entry to you ~/.m2/settings.xml file:

<activeProfiles>
  <activeProfile>lifs-repos</activeProfile>
</activeProfiles>

or use the -Plifs-repos command line switch when running Maven to enable the LIFS JFrog maven repositories for parent pom and artifact resolution.

To use the parser libraries (reading and validation) in your own Maven projects, use the following dependency:

<dependency>
  <groupId>org.lifs-tools</groupId>
  <artifactId>jgoslin-parsers</artifactId>
  <version>${jgoslin.version}</version>
</dependency>

where jgoslin.version is the version of jgoslin you wish to use, e.g. for a release version:

<properties>
  <jgoslin.version>2.0.0</jgoslin.version>
</properties>

as defined in the properties section of your pom file.

Using the API programmatically

Reading a Shorthand Lipid Name with a given Parser

The following snippet shows how to parse a shorthand lipid name with the different parsers:

import org.lifstools.jgoslin.domain.*; // contains Domain objects like LipidAdduct, LipidSpecies ...
import org.lifstools.jgoslin.parser.*; // contains the parser implementations
...
String ref = "Cer(d18:1/20:2)";
try {
	// use the SwissLipids parser
	SwissLipidsParser slParser = new SwissLipidsParser();
	// multiple eventhandlers can be used with one parser, e.g. in parallel processing
	SwissLipidsParserEventHandler slHandler = slParser.newEventHandler();
	LipidAdduct sllipid = slParser.parse(ref, slHandler);
	System.out.println(sllipid.getLipidString()); // to print the lipid name at its native level to the console
} catch (LipidException ptve) {
// catch this for any parsing or semantic issues with a lipid
	ptve.printStackTrace();
}
//alternatively, use the other parsers. Don't forget to place try catch blocks around the following lines, as for the SwissLipids parser example
// use the LipidMAPS parser
LipidMapsParser lmParser = new LipidMapsParser();
LipidMapsParserEventHandler lmHandler = lmParser.newEventHandler();
LipidAdduct lmlipid = lmParser.parse(ref, lmHandler);
// use the shorthand notation parser GOSLIN
GoslinParser goslinParser = new GoslinParser();
GoslinParserEventHandler goslinHandler = goslinParser.newEventHandler();
LipidAdduct golipid = goslinParser.parse(ref, goslinHandler);
// use the updated shorthand notation of 2020
ShorthandParser shorthandParser = new ShorthandParser();
ShorthandParserEventHandler shorthandHandler = shorthandParser.newEventHandler();
// calling parse with the optional argument false suppresses any exceptions, if errors are encountered, the returned LipidAdduct will be null
LipidAdduct shlipid = shorthandParser.parse(ref, shorthandHandler, false);

To retrieve a parsed lipid name on a higher hierarchy of lipid level, simply define the level when requesting the lipid name:

System.out.println(sllipid.getLipidString(LipidLevel.CATEGORY));
System.out.println(sllipid.getLipidString(LipidLevel.CLASS));
System.out.println(sllipid.getLipidString(LipidLevel.SPECIES));
System.out.println(sllipid.getLipidString(LipidLevel.MOLECULAR_SPECIES));
System.out.println(sllipid.getLipidString(LipidLevel.SN_POSITION));
System.out.println(sllipid.getLipidString(LipidLevel.STRUCTURE_DEFINED));
System.out.println(sllipid.getLipidString(LipidLevel.FULL_STRUCTURE));
System.out.println(sllipid.getLipidString(LipidLevel.COMPLETE_STRUCTURE));

This functionality allows easy computation of aggregate statistics of lipids on lipid class, category or arbitrary levels. Requesting a lipid name on a lower level than the provided will raise a org.lifstools.jgoslin.domain.ConstraintViolationException.

For more examples how the API works, please consult the tests, especially in the parsers module.

References

This project is the Java implementation for Goslin. If you use Goslin or any of the specific implementations in your work, we kindly ask you to cite the original publication:

If you are using any of the new features of Goslin 2.0, please cite the following, updated Goslin 2.0 publication: