Paws on Elasticsearch

Elasticsearch is a good fulltext search engine.

Wikipedia search is powered by Elasticsearch.
The Guardian joins access log data with social network data using Elasticsearch to give editors an idea of how public is reponding to articles.
StackOverflow fulltext search is powered by Elasticsearch. They use the more like this feature to find similar answers.
GitHub uses Elasticsearch to query 130 billion lines of code

Prerequisites

Docker and Python 2.7 with pip or easy_istall and internet access.

Get code. git clone git@github.com:sulmanen/es-movies.git
Fire up elasticsearch. docker-compose up
Verify. curl http://localhost:9200
Deps. pip install requests && pip install BeautifulSoup
Create index. ./et index create 0
Create alias. ./et index alias movies 0
Verify alias. curl http://localhost:9200/_aliases
Load data. python2.7 import-movies.py
Fire up crappy ui. http-server
Navigate to http://localhost:8080/

Excercises

We are using UCI Movies Dataset of over 10k films. The titles are from late 1800's to 1999.

URI Search

Find all the Academy Awards winners in the database. AA stands for winning an Academy Award.

Find the film Elmer Gantry in the raw data. Did it win an Academy Award?

Boolean Query

Find all the Academny Award winners excluding those who were just nominated (AAN).
Try to filter all those movies which contain the word 'Vampire'. How many are there? What's up with the score.

Funtion Score Query

The Best films are not in any particular order. Let's see if we can use a function score to order the results after matches have been made. Perhaps the field_value_factor or the decay functions can help us order our movies.
Something isn't right. Let's look at what our index looks like. curl http://localhost:9200/movies. What's the problem?

Creating an index mapping.

Tuning relevance in Elasticsearch is a dance between the index and the query. Let's add some mappings! In order to change the mappings, we will create a new index named 1. There are some ready made mappings. But is there something we should change to make the function score work?

./et create index 1 ./et reindex 0 1 ./et index alias movies 1 0

You can update the index in production in this way without downtime, and also roll back if the new index has a problem.

Find academy award winners in drama category?

'Dram' is the keyword to find Dramas. Can you find drama academy award winners?

It's a long way from V to Vampire

Once you start typing into the typeahead field the experience isn't very satisfying. Let's create a typeahead index.

sulmanen/es-movies

Paws on Elasticsearch

Prerequisites

Excercises

URI Search

Boolean Query

Funtion Score Query

Creating an index mapping.

Find academy award winners in drama category?

It's a long way from V to Vampire

Let's add language analyzers into the mix

They have inherent weaknesses, so let's add the original field to the side of the analyzed one.

But what about stop words?

Bigrams for efficiently matching names

Exact phrase matching

Fuzzy query and minimum should match