/jules-sentence-ae

The JULES Sentence Boundary Detector. A CRF-based component for punctuation-disambiguation that has been integrated as a UIMA component but can still be used on its own.

Primary LanguageJavaOtherNOASSERTION

JULIE Sentence Boundary Detector (JSBD)

Introduction

JSBD is a ML-based sentence splitter. It can be retrained on supported training material and is thus neither language nor domain dependent.

Dependencies

JTBD is based on a slightly modified version of the machine learning toolkit MALLET (Version 2.0.x). The necessary libraries are included in the executable JAR (see below) and accessible via the JULIE Nexus artifact manager.

Usage

To run JSBD just run the self-executing JAR "jsbd-<version>.jar". This will show the available modes.

Documentation

For further information please refer to the documentation, JSBD-x.pdf.

technical notes

All components offered on the GitHub page are also available as Maven artifacts from our Nexus repository. To make use of it, add the following repository to your pom.xml:

<repositories>
   <repository>
      <id>julie-nexus</id>
      <name>JULIELab Public Repository</name>
      <url>https://www.coling.uni-jena.de/nexus/content/groups/public-julie-components/</url>
   </repository>
</repositories>

To access the repository, you will need to make our Nexus server https certificate known to your maven installation. To do this, please follow these steps: First extract the certificate "www.coling.uni-jena.de" from your browser (e.g. in FireFox you can get it by visiting the Preferences/Advanced menu, then show the certificates and store the correct one). Execute this command to import the certificate and enter a password to protect it keytool -v -alias mavensrv -import -file -keystore /trust.jks

Add the following to your .bash_rc/.bash_profile to use this keystore

export MAVEN_OPTS="-Djavax.net.ssl.trustStore=/trust.jks"

We suggest using /Users/<USERNAME>/.m2 as the location of the keystore. Then, you have access to all publicly available JULIE components. Please refer to the pom.xml files in the respective GitHub repositories for the current Maven coordinates.