JULIE Sentence Boundary Detector (JSBD)
JSBD is a ML-based sentence splitter. It can be retrained on supported training material and is thus neither language nor domain dependent.
JTBD is based on a slightly modified version of the machine learning toolkit MALLET (Version 2.0.x). The necessary libraries are included in the executable JAR (see below) and accessible via the JULIE Nexus artifact manager.
To run JSBD just run the self-executing JAR "jsbd-<version>.jar". This will show the available modes.
For further information please refer to the documentation, JSBD-x.pdf.
All components offered on the GitHub page are also available as Maven artifacts from our Nexus repository. To make use of it, add the following repository to your pom.xml:
<repositories>
<repository>
<id>julie-nexus</id>
<name>JULIELab Public Repository</name>
<url>https://www.coling.uni-jena.de/nexus/content/groups/public-julie-components/</url>
</repository>
</repositories>
To access the repository, you will need to make our Nexus server https certificate known to your maven installation. To do this, please follow these steps: First extract the certificate "www.coling.uni-jena.de" from your browser (e.g. in FireFox you can get it by visiting the Preferences/Advanced menu, then show the certificates and store the correct one). Execute this command to import the certificate and enter a password to protect it keytool -v -alias mavensrv -import -file -keystore /trust.jks
Add the following to your .bash_rc/.bash_profile to use this keystore
export MAVEN_OPTS="-Djavax.net.ssl.trustStore=/trust.jks"
We suggest using /Users/<USERNAME>/.m2 as the location of the keystore. Then, you have access to all publicly available JULIE components. Please refer to the pom.xml files in the respective GitHub repositories for the current Maven coordinates.