Beginner's Crash Course to Elastic Stack Series

Part 2: Understanding the relevance of your search with Elasticsearch and Kibana

Welcome to the Beginner's Crash Course to Elastic Stack!

This repo contains all resources shared during Part 2: Understanding the relevance of your search with Elasticsearch and Kibana.

By the end of this workshop, you will:

  • learn how Precision and Recall are used to measure how well Elastic search engine is searching
  • understand how scoring is used to rank the relevance of your search results in Elasticsearch
  • master how to send search queries from Kibana to Elasticsearch to finetune Precision or Recall of your search results

Resources

Beginner's Crash Course to Elastic Stack Table of Contents This workshop is a part of the Beginner's Crash Course to Elastic Stack series. Check out this table contents to access all the workshops in the series thus far. This table will continue to get updated as more workshops in the series are released!

Free Elastic Cloud Trial

Instructions on how to access Elasticsearch and Kibana on Elastic Cloud

Instructions for downloading Elasticsearch and Kibana

Video of the workshop

Mini Beginner's Crash Course to Elasticsearch & Kibana playlist

Do you prefer learning by watching shorter videos? Check out this playlist to watch short clips of beginner's crash course full length workshops. Part 2 workshop is broken down into episodes 7-10. Season 2 clips will be uploaded here in the future!

Presentation

News Headlines Dataset

Blog on understanding the relevance of your search with Elasticsearch and Kibana

Elastic America Virtual Chapter Want to attend live workshops? Join the Elastic America Virtual Chapter to get the deets!

What's next? Eager to continue your learning after mastering the concept from this workshop? Move on to Part 3: Running full text queries and combined queries with Elasticsearch and Kibana here!

Search for information

There are two main ways to search in Elasticsearch:

  1. Queries
  2. Aggregations

Queries

Queries retrieve documents that match the criteria.

Retrieve information about documents in an index

Syntax:

GET enter_name_of_the_index_here/_search

Example:

GET news_headlines/_search

Expected response from Elasticsearch:

Elasticsearch displays a number of hits and a sample of 10 search results by default.

image

Get the exact total number of hits

To improve the response speed on large datasets, Elasticsearch limits the total count to 10,000 by default. If you want the exact total number of hits, use the following query.

Syntax:

GET enter_name_of_the_index_here/_search
{
  "track_total_hits": true
}

Example:

GET news_headlines/_search
{
  "track_total_hits": true
}

Expected response from Elasticsearch:

You will see that the total number of hits is now 200,853.

image

Search for data within a specific time range

Syntax:

GET enter_name_of_the_index_here/_search
{
  "query": {
    "Specify the type of query here": {
      "Enter name of the field here": {
        "gte": "Enter lowest value of the range here",
        "lte": "Enter highest value of the range here"
      }
    }
  }
}

Example:

GET news_headlines/_search
{
  "query": {
    "range": {
      "date": {
        "gte": "2015-06-20",
        "lte": "2015-09-22"
      }
    }
  }
}

Expected response from Elasticsearch:

It will pull up articles published from June 20, 2015 through September 22, 2015. A document from the result set was shown as an example.

image

Aggregations

An aggregation summarizes your data as metrics, statistics, and other analytics.

Analyze the data to show the categories of news headlines in our dataset

Syntax:

GET enter_name_of_the_index_here/_search
{
  "aggs": {
    "name your aggregation here": {
      "specify aggregation type here": {
        "field": "name the field you want to aggregate here",
        "size": state how many buckets you want returned here
      }
    }
  }
}

Example:

GET news_headlines/_search
{
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category",
        "size": 100
      }
    }
  }
}

Expected response from Elasticsearch:

image

A combination of query and aggregation request

Search for the most significant term in a category

Syntax:

GET enter_name_of_the_index_here/_search
{
  "query": {
    "match": {
      "Enter the name of the field": "Enter the value you are looking for"
    }
  },
  "aggregations": {
    "Name your aggregation here": {
      "significant_text": {
        "field": "Enter the name of the field you are searching for"
      }
    }
  }
}

Example:

GET news_headlines/_search
{
  "query": {
    "match": {
      "category": "ENTERTAINMENT"
    }
  },
  "aggregations": {
    "popular_in_entertainment": {
      "significant_text": {
        "field": "headline"
      }
    }
  }
}

Expected response from Elasticsearch:

image

Precision and Recall

Increasing Recall

Syntax:

GET enter_name_of_index_here/_search
{
  "query": {
    "match": {
      "Specify the field you want to search": {
        "query": "Enter search terms"
      }
    }
  }
}

Example:

GET news_headlines/_search
{
  "query": {
    "match": {
      "headline": {
        "query": "Khloe Kardashian Kendall Jenner"
      }
    }
  }
}

Expected response from Elasticsearch:

By default, the match query uses an "OR" logic. If a document contains one of the search terms, Elasticsearch will consider that document as a hit.

"OR" logic results in higher number of hits, thereby increasing recall. However, the hits are loosely related to the query and lowering precision as a result.

image

Increasing Precision

We can increase precision by adding an "and" operator to the query.

Syntax:

GET enter_name_of_index_here/_search
{
  "query": {
    "match": {
      "Specify the field you want to search": {
        "query": "Enter search terms",
        "operator": "and"
      }
    }
  }
}

Example:

GET news_headlines/_search
{
  "query": {
    "match": {
      "headline": {
        "query": "Khloe Kardashian Kendall Jenner",
        "operator": "and"
      }
    }
  }
}

Expected response from Elasticsearch:

"AND" operator will result in getting more precise matches, thereby increasing precision. However, it will reduce the number of hits returned, resulting in lower recall.

image

minimum_should_match

This parameter allows you to specify the minimum number of terms a document should have to be included in the search results.

This parameter gives you more control over fine tuning precision and recall of your search.

Syntax:

GET enter_name_of_index_here/_search
{
  "query": {
    "match": {
      "headline": {
        "query": "Enter search term here",
        "minimum_should_match": Enter a number here
      }
    }
  }
}

Example:

GET news_headlines/_search
{
  "query": {
    "match": {
      "headline": {
        "query": "Khloe Kardashian Kendall Jenner",
        "minimum_should_match": 3
      }
    }
  }
}

Expected response from Elasticsearch:

With minimum_should_match parameter, we were able to finetune both precision and recall!

image