Search

This project provides a search interface to the computas.no website.

Installation

brew install yarn
yarn
yarn start

Live Demo

A live demo of the system is available at http://cx.koren.im. It probably works best in Google Chrome.

Description

The system essentially consist of three distinct parts:

Crawler

The crawler starts at computas.no and extracts the title, content and all links on each page, and puts the internal links in the extraction work queue. When all links are resolved and bodies parsed, the crawler outputs a JSON site map that can be fed to the offline indexer and/or sent to the remote ElasticSearch instance. In addition to extracting the basic content, the crawler will also take a screenshot of the current page, that is later used as previews in the search results listings. The crawler attempts to clean the main content of the page by stripping all the html tags inside the <main> element of computas.no.

Indexer

Items are indexed by a remote ElasticSearch instance.

User Interface

The interface is implemented as a regular web application enhanced with speech recognition abilities for an even easier way to express queries. The recognition is built using the Web Speech API implemented in modern browsers. All results are presented as SCREENSHOTS of the resulting page in a carousel with a link to open the page to minimize the time taken to inspect the results.