IRAPI - media search engine
IRAPI is a repository that holds all parts of builded MEDIA SEARCH ENGINE, which was initially developed for the LinkedTV project.
Folder description
-
"dashboard" : contains web application which displays detailed statistics for Apache Solr index and allows to edit the seed list.
-
"nutch-plugin" : contains plugin for Apache Nutch 2. Its purpose is to extract media from webpages
- instalation and usage : /wiki/Media-extractor-plugin---installation&usage
- principle, developers perspective : wiki/Media-extractor-plugin----developer-perspectiven
-
"solr-example-conf/cores" : example configuration for Apache Solr index compatible with data structure required by the media-extractor (nutch-plugin)
-
"search" : contains web application providing endpoint for searching over indexed media data
-
"focusedcrawler, focusedcrawler client" : application for focused on-demand video crawling (wraps on-line search of several news websites) Within IRAPI, the focused crawl is triggered by query issues against the search web application.
Note: While the project is customized for LinkedTV purposes, it can serve as inpiration or template for other related uses.
More information about usage and instalations to individual application on related wiki pages or in folders README.