/multimedia-textsearch

University project, lecture Text Indexing. The idea is to search within audio/video for keywords by building an inverted index beforehand.

Primary LanguageJava

multimedia-textsearch

Authors: Dominik Messinger, Alexander Weigl and Ge Wu
License: gpl-v3

Description

University project, lecture Text Indexing. The idea is to search within audio/video for keywords by building an inverted index beforehand.

We introduce the concept of timed documents. A timed document contains the documents text sliced into blocks with time information. These documents are produce by preprocessing from audio and video files and can be stored in a XML format. The inverted index is generated upon these timed document.

Dependencies

Java Dependencies: * Apache Commons IO * Apache Commons Lang * JavaTuples * jdom * json-simple

External Dependencies: * working tesseract installation (for win32 binaries are included) * ffmpeg (for win32 binaries are included)