LumpSTS

Lump efforts on assessing Semantic Textual Similarity

In this project we includo to process, compute features, and learn models for the different STS tasks

Set-up and installation

Download and install WikiTailor source
git clone https://github.com/cristinae/WikiTailor.git
mvn -DskipTests clean compile assembly:single install

If you need to annotate corpora:

Download and install MADAMIRA jar
License for downloading
mvn install:install-file -Dfile={$PATH}/MADAMIRA-release-20160516-2.1/MADAMIRA-release-20160516-2.1.jar -DgroupId=edu.columbia.ccls.madamira -DartifactId=MADAMIRA-release -Dversion=20160516-2.1 -Dpackaging=jar

If you need to work with BabelNet indices:

Download and install the BabelNet API and its dependencies
[API download] (http://babelnet.org/data/3.7/BabelNet-API-3.7.zip)
unzip BabelNet-API-3.7.zip
mvn install:install-file -Dfile=lib/jltutils-2.2.jar -DgroupId=it.uniroma1.lcl.jlt -DartifactId=jltutils -Dversion=2.2 -Dpackaging=jar
unzip -p babelnet-api-3.7.jar META-INF/maven/it.uniroma1.lcl.babelnet/babelnet-api/pom.xml | grep -vP '<(scope|systemPath)>' >babelnet-api-3.7.pom
(consider using homebrew's ggrep if on OsX)
mvn install:install-file -Dfile=babelnet-api-3.7.jar -DpomFile=babelnet-api-3.7.pom
Download BabelNet indices and make the API aware of them
[Indices download] (http://babelnet.org/login)
tar xjvf babelnet-3.7-index.tar.bz2

In ./BabelNet-API-3.7/config/babelnet.var.properties include the path to the index:
babelnet.dir=/home/usr/BabelNet-3.7
In ./BabelNet-API-3.7/config/jlt.var.properties include the path to WordNet:
jlt.wordnetPrefix=/usr/local/share/wordnet
Move the ./BabelNet-API-3.7/config folder to your ${basedir}

If you need to use the machine learning module:

Download and install xgboost
git clone https://github.com/dmlc/xgboost.git
which requires the dmlc-core and rabit packages. Download them into xgboost corresponding folders and make both of them
git clone https://github.com/dmlc/dmlc-core.git
git clone https://github.com/dmlc/rabit.git
Now you are ready to compile in folder ./xgboost/jvm-packages
mvn package
mvn install

Finally:

Download and install this repository
git clone https://github.com/albarron/LumpSTS.git
mvn clean dependency:copy-dependencies package

External resources

Download the IXA pipes for tokenisation and lemmatisation. They are used as an external executable, no need for installation.
Download page
Include their path in the configuration file lumpSTS.ini
Use the Moses tokeniser included in the ./scripts folder
Include its path in the configuration file lumpSTS.ini

### External software and models
# IXA pipe
ixaTok=/home/cristinae/soft/processors/ixa/ixa-pipe-tok-1.8.5-exec.jar
ixaLem=/home/cristinae/soft/processors/ixa/ixa-pipe-pos-1.5.1-exec.jar
posEs=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/es/es-pos-perceptron-autodict01-ancora-2.0.bin
lemEs=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/es/es-lemma-perceptron-ancora-2.0.bin
posEn=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/en/en-pos-perceptron-autodict01-conll09.bin
lemEn=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/en/en-lemma-perceptron-conll09.bin

# Moses
mosesTok=/home/cristinae/soft/processors/moses/tokenizerNO2html.perl

albarron/LumpSTS

LumpSTS

Lump efforts on assessing Semantic Textual Similarity

Set-up and installation

External resources