In this project we includo to process, compute features, and learn models for the different STS tasks
- Download and install WikiTailor source
git clone https://github.com/cristinae/WikiTailor.git
mvn -DskipTests clean compile assembly:single install
If you need to annotate corpora:
- Download and install MADAMIRA jar
License for downloading
mvn install:install-file -Dfile={$PATH}/MADAMIRA-release-20160516-2.1/MADAMIRA-release-20160516-2.1.jar -DgroupId=edu.columbia.ccls.madamira -DartifactId=MADAMIRA-release -Dversion=20160516-2.1 -Dpackaging=jar
If you need to work with BabelNet indices:
-
Download and install the BabelNet API and its dependencies
[API download] (http://babelnet.org/data/3.7/BabelNet-API-3.7.zip)
unzip BabelNet-API-3.7.zip
mvn install:install-file -Dfile=lib/jltutils-2.2.jar -DgroupId=it.uniroma1.lcl.jlt -DartifactId=jltutils -Dversion=2.2 -Dpackaging=jar
unzip -p babelnet-api-3.7.jar META-INF/maven/it.uniroma1.lcl.babelnet/babelnet-api/pom.xml | grep -vP '<(scope|systemPath)>' >babelnet-api-3.7.pom
(consider using homebrew's ggrep if on OsX)
mvn install:install-file -Dfile=babelnet-api-3.7.jar -DpomFile=babelnet-api-3.7.pom
-
Download BabelNet indices and make the API aware of them
[Indices download] (http://babelnet.org/login)
tar xjvf babelnet-3.7-index.tar.bz2
- In
./BabelNet-API-3.7/config/babelnet.var.properties
include the path to the index:
babelnet.dir=/home/usr/BabelNet-3.7
- In
./BabelNet-API-3.7/config/jlt.var.properties
include the path to WordNet:
jlt.wordnetPrefix=/usr/local/share/wordnet
- Move the
./BabelNet-API-3.7/config
folder to your ${basedir}
If you need to use the machine learning module:
- Download and install xgboost
git clone https://github.com/dmlc/xgboost.git
which requires the dmlc-core and rabit packages. Download them into xgboost corresponding folders andmake
both of them
git clone https://github.com/dmlc/dmlc-core.git
git clone https://github.com/dmlc/rabit.git
Now you are ready to compile in folder./xgboost/jvm-packages
mvn package
mvn install
Finally:
- Download and install this repository
git clone https://github.com/albarron/LumpSTS.git
mvn clean dependency:copy-dependencies package
-
Download the IXA pipes for tokenisation and lemmatisation. They are used as an external executable, no need for installation.
Download page
Include their path in the configuration file lumpSTS.ini -
Use the Moses tokeniser included in the ./scripts folder
Include its path in the configuration file lumpSTS.ini
### External software and models
# IXA pipe
ixaTok=/home/cristinae/soft/processors/ixa/ixa-pipe-tok-1.8.5-exec.jar
ixaLem=/home/cristinae/soft/processors/ixa/ixa-pipe-pos-1.5.1-exec.jar
posEs=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/es/es-pos-perceptron-autodict01-ancora-2.0.bin
lemEs=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/es/es-lemma-perceptron-ancora-2.0.bin
posEn=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/en/en-pos-perceptron-autodict01-conll09.bin
lemEn=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/en/en-lemma-perceptron-conll09.bin
# Moses
mosesTok=/home/cristinae/soft/processors/moses/tokenizerNO2html.perl