/LumpSTS

Lump participation at SemEval 2017 STS

Primary LanguageJava

LumpSTS

Lump efforts on assessing Semantic Textual Similarity

In this project we includo to process, compute features, and learn models for the different STS tasks


Set-up and installation

  1. Download and install WikiTailor source
    git clone https://github.com/cristinae/WikiTailor.git
    mvn -DskipTests clean compile assembly:single install

If you need to annotate corpora:

  1. Download and install MADAMIRA jar
    License for downloading
    mvn install:install-file -Dfile={$PATH}/MADAMIRA-release-20160516-2.1/MADAMIRA-release-20160516-2.1.jar -DgroupId=edu.columbia.ccls.madamira -DartifactId=MADAMIRA-release -Dversion=20160516-2.1 -Dpackaging=jar

If you need to work with BabelNet indices:

  1. Download and install the BabelNet API and its dependencies
    [API download] (http://babelnet.org/data/3.7/BabelNet-API-3.7.zip)
    unzip BabelNet-API-3.7.zip
    mvn install:install-file -Dfile=lib/jltutils-2.2.jar -DgroupId=it.uniroma1.lcl.jlt -DartifactId=jltutils -Dversion=2.2 -Dpackaging=jar
    unzip -p babelnet-api-3.7.jar META-INF/maven/it.uniroma1.lcl.babelnet/babelnet-api/pom.xml | grep -vP '<(scope|systemPath)>' >babelnet-api-3.7.pom
    (consider using homebrew's ggrep if on OsX)
    mvn install:install-file -Dfile=babelnet-api-3.7.jar -DpomFile=babelnet-api-3.7.pom

  2. Download BabelNet indices and make the API aware of them
    [Indices download] (http://babelnet.org/login)
    tar xjvf babelnet-3.7-index.tar.bz2

  • In ./BabelNet-API-3.7/config/babelnet.var.properties include the path to the index:
    babelnet.dir=/home/usr/BabelNet-3.7
  • In ./BabelNet-API-3.7/config/jlt.var.properties include the path to WordNet:
    jlt.wordnetPrefix=/usr/local/share/wordnet
  • Move the ./BabelNet-API-3.7/config folder to your ${basedir}

If you need to use the machine learning module:

  1. Download and install xgboost
    git clone https://github.com/dmlc/xgboost.git
    which requires the dmlc-core and rabit packages. Download them into xgboost corresponding folders and make both of them
    git clone https://github.com/dmlc/dmlc-core.git
    git clone https://github.com/dmlc/rabit.git
    Now you are ready to compile in folder ./xgboost/jvm-packages
    mvn package
    mvn install

Finally:

  1. Download and install this repository
    git clone https://github.com/albarron/LumpSTS.git
    mvn clean dependency:copy-dependencies package

External resources

  1. Download the IXA pipes for tokenisation and lemmatisation. They are used as an external executable, no need for installation.
    Download page
    Include their path in the configuration file lumpSTS.ini

  2. Use the Moses tokeniser included in the ./scripts folder
    Include its path in the configuration file lumpSTS.ini

### External software and models
# IXA pipe
ixaTok=/home/cristinae/soft/processors/ixa/ixa-pipe-tok-1.8.5-exec.jar
ixaLem=/home/cristinae/soft/processors/ixa/ixa-pipe-pos-1.5.1-exec.jar
posEs=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/es/es-pos-perceptron-autodict01-ancora-2.0.bin
lemEs=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/es/es-lemma-perceptron-ancora-2.0.bin
posEn=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/en/en-pos-perceptron-autodict01-conll09.bin
lemEn=/home/cristinae/soft/processors/ixa/morph-models-1.5.0/en/en-lemma-perceptron-conll09.bin

# Moses
mosesTok=/home/cristinae/soft/processors/moses/tokenizerNO2html.perl