Semanticizer
The Semanticizer is a web service application for semantic linking created in 2012 by Daan Odijk at ILPS (University of Amsterdam).
This project since received contributions from (in alphabetical order): Lars Buitinck, David Graus, Tom Kenter, Evert Lammerts, Edgar Meij, Daan Odijk, Anne Schuth and Isaac Sijaranamual.
The algorithms for this webservice are developed for and described in a OAIR2013 publication on Feeding the Second Screen by Daan Odijk, Edgar Meij and Maarten de Rijke. Part of this research was inspired by earlier ILPS publications: Adding Semantics to Microblog Posts and Mapping Queries To The Linking Open Data Cloud. If you use this webservice for your own research, please include a reference to the OAIR2013 article or alternatively any of these articles.
The online documentation describes how to use the Semanticizer Web API. This REST-like web service returns JSON and is exposed to public at: http://semanticize.uva.nl/api/. Currently an access key for the webservice is not needed.
The code is released under LGPL license (see below). If you have any questions, contact Daan.
If you want to dive into the code, start at semanticizer/server/__main__.py
.
Requirements
-
The software has been tested with Python 2.7.3 on Mac OS X 2.8 and Linux (RedHat EL5, Debian jessie/sid and Ubuntu 12.04.)
-
The following Python modules need to be installed (using easy_install or pip):
- nltk
- python-Levenshtein
- networkx
- lxml
- flask
- redis (optional, see point 4)
- scikit-learn (optional, see point 6)
- scipy (optional, see point 6)
- mock (optional, used by the tests)
-
A summary of a Wikipedia dump is needed. For this, download the Wikipedia Miner CSV files.
-
Copy one of the two config files in the
conf
folder tosemanticizer.cfg
in that folder and adapt to your situation. You have the choice of loading all data into memory (usesemanticizer.memory.cfg
) or into Redis using the following steps:-
Copy
semanticizer.redis.cfg
intosemanticizer.cfg
. -
Redis server needs to be set up and running.
-
Load data into redis:
python -m semanticizer.redisinsert english en enwiki-20110722
. --langloc dutch nl nlwiki-20111104
-
-
Run the server using
python -m semantizicer.server
. -
In order to work with the features you need to install the scikit-learn and scipy packages. Before installing scipy you need to have swig installed. See it's INSTALL for instructions. (configure, make, make install). Note that working with features is still under active development and therefore not fully documented and tested.
License
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.