/semanticizer

Entity Linking for the masses

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Semanticizer

The Semanticizer is a web service application for semantic linking created in 2012 by Daan Odijk at ILPS (University of Amsterdam).

This project since received contributions from (in alphabetical order): Lars Buitinck, David Graus, Tom Kenter, Evert Lammerts, Edgar Meij, Daan Odijk, Anne Schuth and Isaac Sijaranamual.

The algorithms for this webservice are developed for and described in a OAIR2013 publication on Feeding the Second Screen by Daan Odijk, Edgar Meij and Maarten de Rijke. Part of this research was inspired by earlier ILPS publications: Adding Semantics to Microblog Posts and Mapping Queries To The Linking Open Data Cloud. If you use this webservice for your own research, please include a reference to the OAIR2013 article or alternatively any of these articles.

The online documentation describes how to use the Semanticizer Web API. This REST-like web service returns JSON and is exposed to public at: http://semanticize.uva.nl/api/. Currently an access key for the webservice is not needed.

The code is released under LGPL license (see below). If you have any questions, contact Daan.

If you want to dive into the code, start at semanticizer/server/__main__.py.

Requirements

  1. The software has been tested with Python 2.7.3 on Mac OS X 2.8 and Linux (RedHat EL5, Debian jessie/sid and Ubuntu 12.04.)

  2. The following Python modules need to be installed (using easy_install or pip):

    • nltk
    • python-Levenshtein
    • networkx
    • lxml
    • flask
    • redis (optional, see point 4)
    • scikit-learn (optional, see point 6)
    • scipy (optional, see point 6)
    • mock (optional, used by the tests)
  3. A summary of a Wikipedia dump is needed. For this, download the Wikipedia Miner CSV files.

  4. Copy one of the two config files in the conf folder to semanticizer.cfg in that folder and adapt to your situation. You have the choice of loading all data into memory (use semanticizer.memory.cfg) or into Redis using the following steps:

    1. Copy semanticizer.redis.cfg into semanticizer.cfg.

    2. Redis server needs to be set up and running.

    3. Load data into redis: python -m semanticizer.redisinsert english en enwiki-20110722. --langloc dutch nl nlwiki-20111104

  5. Run the server using python -m semantizicer.server.

  6. In order to work with the features you need to install the scikit-learn and scipy packages. Before installing scipy you need to have swig installed. See it's INSTALL for instructions. (configure, make, make install). Note that working with features is still under active development and therefore not fully documented and tested.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.