CompBase
Basic classes for working with parsed sentences, comparison information, NLP tools for basic processing and other useful classes.
Basic packages
-
de.uni_stuttgart.ims.nlpbase.io
: Reading and writing files with dependency parse trees in CoNLL format. -
de.uni_stuttgart.ims.nlpbase.nlp
: Basic stuff to represent a word, a sentence, part-of-speech tags, predicate-argument information. -
de.uni_stuttgart.ims.nlpbase.tools
: Sentence splitting, tokenization, part-of-speech tagging. Actually only wrapper around the tools provided by OpenNLP and Stanford CoreNLP. -
de.uni_stuttgart.ims.util
: Assorted useful stuff, e.g., for reading/writing files, handling HashMaps, mapping stuff to parse trees, etc.
Stuff from other people
Needed for the classes in de.uni_stuttgart.ims.nlpbase.tools
, i.e., sentence splitter, tokenizer, POS tagger. You can always opt to not use these classes or use only one of them.
The code assumes the model files are located in models/
and are called exactly what is written below.
- Stanford CoreNLP aka
stanford-corenlp-3.2.0.jar
You will need the model fileenglish-left3words-distsim.tagger
forPOSTaggerStanford
. - OpenNLP Tools, aka
opennlp-tools-1.5.2-incubating.jar
. You will need the model filesen-sent.bin
forSentenceSplitterOpenNLP
anden-token.bin
forTokenizerOpenNLP
Usage
These are only helper classes, there is no main method. But you will need this for other projects. You will probably not need all the classes, so you can only just compile those that you need at that time.
Compile all classes (this assumes you have the two needed jar files in the folder lib
and want to have the class files in bin
):
mkdir bin
javac -cp bin -d bin src/de/uni_stuttgart/ims/nlpbase/nlp/*.java
javac -cp bin -d bin src/de/uni_stuttgart/ims/nlpbase/io/*.java
javac -cp bin:lib/stanford-corenlp-3.2.0.jar:lib/opennlp-tools-1.5.2-incubating.jar -d bin src/de/uni_stuttgart/ims/nlpbase/tools/*.java
javac -cp bin -d bin src/de/uni_stuttgart/ims/util/*.java
Licence
(c) Wiltrud Kessler
This code is distributed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license http://creativecommons.org/licenses/by-nc-sa/3.0/