/project-mlp

a machine learning approach for processing mathematical language in scientific documents

Primary LanguageJava

Mathematical Language Processing

Run

  • compile the maven project
  • adapt the paths to your stratosphere environment in the file cluster-run.sh
  • setup the right values for the parameters of the ranking algorithm also in cluster-run.sh
  • execute the script

Notice

To start the processor, an additional model file is needed. Download the Stanford POS tagger from http://nlp.stanford.edu/software/tagger.shtml. Within this archive is a directory called pos-tagger-models/, containing a variaty of model files for a couple of languages.

If uncertain, the english-left3words-distsim.tagger model is a good starting point.

Tested with http://nlp.stanford.edu/software/stanford-postagger-2012-11-11.zip ... the most recent version http://nlp.stanford.edu/software/stanford-postagger-2014-01-04.zip is currently beeing tested.

Log

To trace was was done on the MLP server install stratosphere via debian package physikerwelt@mlp:~/stanford-postagger-2014-01-04/models$ cp english-left3words-distsim.tagger ~