/mte

Master Thesis Extractor

Primary LanguageScalaOtherNOASSERTION

Work in progress!!

Master Thesis Extractor (MTE)

About

The Master Thesis Extractor is a temporal information extraction system, which outputs RDF from Wikipedia Infoboxes.

It reuses the DBPedia Extraction Framework Mapping Extractor to extract values and reuses Heidel Time to extract temporal expressions.

Datasets

Currently the main focus of MTE is to extract a time-based dataset about 'companies'. The latest release of the dataset can be downloaded from: http://tiny.cc/tmpcompany

Run

In order to run an extraction on your own either download a binary release or build it on your own.

MTE uses a MongoDB to cache wiki articles and revisions. By default MTE connects to the Mongo DB listening on 127.0.0.1:27017. Set the environment variable MTE_MONGODB to a valid connection string in order to overwrite the default value.

Run the app using a binary release:

  1. Unzip the binary release file
  2. Execute 'bin/mte', respectively 'bin/mte.bat'

Run your own build:

  1. Follow the instructions to compile your own build
  2. Execute 'sbt start' from the MTE project directory

For further details see the play documentation.

Build

MTE is a Play Framework application and therefore uses sbt as its build tool. For further information please see the play documentation

MTE expects all its dependencies being available in a Apache Ivy or a Apache Maven repository.

TODO: Provide details on where to download dependencies, which are not on maven central.

Requirements: Java 8, sbt

License

The source code is published under the terms of the GNU General Public License, version 2.

asdasdjkk