Beginner's Crash Course to Elastic Stack Series

Part 5: Understanding Mapping with Elasticsearch and Kibana

Welcome to the Beginner's Crash Course to Elastic Stack!

This repo contains all resources shared during Part 5: Understanding Mapping with Elasticsearch and Kibana.

Have you ever encountered the error “Field type is not supported for [whatever you are trying to do with Elasticsearch]”?

The most likely culprit of this error is the mapping of your index!

Mapping is the process of defining how a document and its fields are indexed and stored. It defines the type and format of the fields in the documents. As a result, mapping can significantly affect how Elasticsearch searches and stores data.

Understanding how mapping works will help you define mapping that best serves your use case.

By the end of this workshop, you will be able to define what a mapping is and define your own mapping to make indexing and searching more efficient.

Resources

Table of Contents: Beginner's Crash Course to Elastic Stack:

This workshop is a part of the Beginner's Crash Course to Elastic Stack series. Check out this table contents to access all the workshops in the series thus far. This table will continue to get updated as more workshops in the series are released!

Free Elastic Cloud Trial

Instructions on how to access Elasticsearch and Kibana on Elastic Cloud

Instructions for downloading Elasticsearch and Kibana

Video recording of the workshop

Mini Beginner's Crash Course to Elasticsearch & Kibana playlist

Do you prefer learning by watching shorter videos? Check out this playlist to watch short clips of beginner's crash course full length workshops. Part 5 workshop is broken down into episodes 19-22. Season 2 clips will be uploaded here in the future!

Presentation Slides

What's next? Eager to continue your learning after mastering the concept from this workshop? Move on to Part 6: Troubleshooting Beginner Level Elasticsearch Errors here!

What is a Mapping?

image

Review from Previous Workshops

image

Indexing a Document

The following request will index the following document.

Syntax:

POST Enter-name-of-the-index/_doc
{
  "field": "value"
}

Example:

POST temp_index/_doc
{
  "name": "Pineapple",
  "botanical_name": "Ananas comosus",
  "produce_type": "Fruit",
  "country_of_origin": "New Zealand",
  "date_purchased": "2020-06-02T12:15:35",
  "quantity": 200,
  "unit_price": 3.11,
  "description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
  "vendor_details": {
    "vendor": "Tropical Fruit Growers of New Zealand",
    "main_contact": "Hugh Rose",
    "vendor_location": "Whangarei, New Zealand",
    "preferred_vendor": true
  }
}

Expected response from Elasticsearch:

Elasticsearch will confirm that this document has been successfully indexed into the temp_index. image

Mapping Explained

Mapping determines how a document and its fields are indexed and stored by defining the type of each field.

image

It contains a list of the names and types of fields in an index. Depending on its type, each field is indexed and stored differently in Elasticsearch.

Dynamic Mapping

When a user does not define mapping in advance, Elasticsearch creates or updates the mapping as needed by default. This is known as dynamic mapping.

image

With dynamic mapping, Elasticsearch looks at each field and tries to infer the data type from the field content. Then, it assigns a type to each field and creates a list of field names and types known as mapping.

Depending on the assigned field type, each field is indexed and primed for different types of requests(full text search, aggregations, sorting). This is why mapping plays an important role in how Elasticsearch stores and searches for data.

View the Mapping

Syntax:

GET Enter_name_of_the_index_here/_mapping

Example:

GET temp_index/_mapping

Expected response from Elasticsearch:

Elasticsearch returns the mapping of the temp_index. It lists all the fields of the document in an alphabetical order and lists the type of each field(text, keyword, long, float, date, boolean and etc).

image image image

For the list of all field types, click here!

Indexing Strings

There are two kinds of string field types:

  1. Text
  2. Keyword

By default, every string gets mapped twice as a text field and as a keyword multi-field. Each field type is primed for different types of requests.

Text field type is designed for full-text searches.

Keywordfield type is designed for exact searches, aggregations, and sorting.

You can customize your mapping by assigning the field type as either text or keyword or both!

Text Field Type

Text Analysis

Ever notice that when you search in Elasticsearch, it is not case sensitive or punctuation does not seem to matter? This is because text analysis occurs when your fields are indexed.

By default, strings are analyzed when it is indexed. The string is broken up into individual words also known as tokens. The analyzer further lowercases each token and removes punctuations.

image

Inverted Index image Once the string is analyzed, the individual tokens are stored in a sorted list known as the inverted index. Each unique token is stored in the inverted index with its associated ID.

The same process occurs every time you index a new document.

image image image image

Keyword Field Type

Keyword field type is used for aggregations, sorting, and exact searches. These actions look up the document ID to find the values it has in its fields.

Keyword field is suited to perform these actions because it uses a data structure called doc values to store data.

For each document, the document id along with the field value(original string) are added to the table. This data structure(doc values) is designed for actions that require looking up the document ID to find the values it has in its fields.

image

When Elasticsearch dynamically creates a mapping for you, it does not know what you want to use a string for so it maps all strings to both field types.

In cases where you do not need both field types, the default setting is wasteful. Since both field types require creating either an inverted index or doc values, creating both field types for unnecessary fields will slow down indexing and take up more disk space.

This is why we define our own mapping as it helps us store and search data more efficiently.

Mapping Exercise

Project: Build an app for a client who manages a produce warehouse

This app must enable users to:

  1. search for produce name, country of origin and description

  2. identify top countries of origin with the most frequent purchase history

  3. sort produce by produce type(Fruit or Vegetable)

  4. get the summary of monthly expense

Sample data

{
  "name": "Pineapple",
  "botanical_name": "Ananas comosus",
  "produce_type": "Fruit",
  "country_of_origin": "New Zealand",
  "date_purchased": "2020-06-02T12:15:35",
  "quantity": 200,
  "unit_price": 3.11,
  "description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
  "vendor_details": {
    "vendor": "Tropical Fruit Growers of New Zealand",
    "main_contact": "Hugh Rose",
    "vendor_location": "Whangarei, New Zealand",
    "preferred_vendor": true
  }
}

Plan of Action image image image image image image image

Defining your own mapping

Rules

  1. If you do not define a mapping ahead of time, Elasticsearch dynamically creates the mapping for you.
  2. If you do decide to define your own mapping, you can do so at index creation.
  3. ONE mapping is defined per index. Once the index has been created, we can only add new fields to a mapping. We CANNOT change the mapping of an existing field.
  4. If you must change the type of an existing field, you must create a new index with the desired mapping, then reindex all documents into the new index.

Step 1: Index a sample document into a test index.

The sample document must contain the fields that you want to define. These fields must also contain values that map closely to the field types you want.

Syntax:

POST Name-of-test-index/_doc
{
  "field": "value"
}

Example:

POST test_index/_doc
{
  "name": "Pineapple",
  "botanical_name": "Ananas comosus",
  "produce_type": "Fruit",
  "country_of_origin": "New Zealand",
  "date_purchased": "2020-06-02T12:15:35",
  "quantity": 200,
  "unit_price": 3.11,
  "description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
  "vendor_details": {
    "vendor": "Tropical Fruit Growers of New Zealand",
    "main_contact": "Hugh Rose",
    "vendor_location": "Whangarei, New Zealand",
    "preferred_vendor": true
  }
}

Expected response from Elasticsearch:

The test_index is successfully created. image

Step 2: View the dynamic mapping

Syntax:

GET Name-the-index-whose-mapping-you-want-to-view/_mapping

Example:

GET test_index/_mapping

Expected response from Elasticsearch:

Elasticsearch will display the mapping it has created. It lists the fields in an alphabetical order. This document is identical to the one we indexed into the temp_index. To save space, the screenshots of the mapping has not been included here.

Step 3: Edit the mapping

Copy and paste the mapping from step 2 into the Kibana console. From the pasted results, remove the "test_index" along with its opening and closing brackets. Then, edit the mapping to satisfy the requirements outlined in the figure below.

image

The optimized mapping should look like the following:

{
  "mappings": {
    "properties": {
      "botanical_name": {
        "enabled": false
      },
      "country_of_origin": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "date_purchased": {
        "type": "date"
      },
      "description": {
        "type": "text"
      },
      "name": {
        "type": "text"
      },
      "produce_type": {
        "type": "keyword"
      },
      "quantity": {
        "type": "long"
      },
      "unit_price": {
        "type": "float"
      },
      "vendor_details": {
        "enabled": false
      }
    }
  }
}

image

Step 4: Create a new index with the optimized mapping from step 3.

Syntax:

PUT Name-of-your-final-index
{
  copy and paste your edited mapping here
}

Example:

PUT produce_index
{
  "mappings": {
    "properties": {
      "botanical_name": {
        "enabled": false
      },
      "country_of_origin": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "date_purchased": {
        "type": "date"
      },
      "description": {
        "type": "text"
      },
      "name": {
        "type": "text"
      },
      "produce_type": {
        "type": "keyword"
      },
      "quantity": {
        "type": "long"
      },
      "unit_price": {
        "type": "float"
      },
      "vendor_details": {
        "enabled": false
      }
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch creates a produce_index with the customized mapping we defined above!

image

Step 5: Check the mapping of the new index to make sure the all the fields have been mapped correctly

Syntax:

GET Name-of-test-index/_mapping

Example:

GET produce_index/_mapping

Expected response from Elasticsearch:

Compared to the dynamic mapping, our optimized mapping looks more simple and concise! The current mapping satisfies the requirements that are marked with green check marks.

image image image

Step 6: Index your dataset into the new index

For simplicity's sake, we will index two documents.

Index the first document

POST produce_index/_doc
{
  "name": "Pineapple",
  "botanical_name": "Ananas comosus",
  "produce_type": "Fruit",
  "country_of_origin": "New Zealand",
  "date_purchased": "2020-06-02T12:15:35",
  "quantity": 200,
  "unit_price": 3.11,
  "description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
  "vendor_details": {
    "vendor": "Tropical Fruit Growers of New Zealand",
    "main_contact": "Hugh Rose",
    "vendor_location": "Whangarei, New Zealand",
    "preferred_vendor": true
  }
}

Expected response from Elasticsearch:

Elasticsearch successfully indexes the first document. image

Index the second document

The second document has almost identical fields as the first document except that it has an extra field called "organic" set to true!

POST produce_index/_doc
{
  "name": "Mango",
  "botanical_name": "Harum Manis",
  "produce_type": "Fruit",
  "country_of_origin": "Indonesia",
  "organic": true,
  "date_purchased": "2020-05-02T07:15:35",
  "quantity": 500,
  "unit_price": 1.5,
  "description": "Mango Arumanis or Harum Manis is originated from East Java. Arumanis means harum dan manis or fragrant and sweet just like its taste. The ripe Mango Arumanis has dark green skin coated with thin grayish natural wax. The flesh is deep yellow, thick, and soft with little to no fiber. Mango Arumanis is best eaten when ripe.",
  "vendor_details": {
    "vendor": "Ayra Shezan Trading",
    "main_contact": "Suharto",
    "vendor_location": "Binjai, Indonesia",
    "preferred_vendor": true
  }
}

Expected response from Elasticsearch:

Elasticsearch successfully indexes the second document. image

Let's see what happens to the mapping by sending this request below:

GET produce_index/_mapping

Expected response from Elasticsearch:

The new field("organic") and its field type(boolean) have been added to the mapping. This is in line with the rules of mapping we discussed earlier since you can add new fields to the mapping. We just cannot change the mapping of an existing field!

image image

What if you do need to make changes to the mapping of an existing field?

Let's say your client changed his mind. He wants to run only full text search on the field "botanical_name" we disabled earlier.

Remember, you CANNOT change the mapping of an existing field. If you do need to make changes to an existing field, you must create a new index with the desired mapping, then reindex all documents into the new index.

STEP 1: Create a new index(produce_v2) with the latest mapping.

We removed the "enabled" parameter from the field "botanical_name" and changed its type to "text".

Example:

PUT produce_v2
{
  "mappings": {
    "properties": {
      "botanical_name": {
        "type": "text"
      },
      "country_of_origin": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "date_purchased": {
        "type": "date"
      },
      "description": {
        "type": "text"
      },
      "name": {
        "type": "text"
      },
      "organic": {
        "type": "boolean"
      },
      "produce_type": {
        "type": "keyword"
      },
      "quantity": {
        "type": "long"
      },
      "unit_price": {
        "type": "float"
      },
      "vendor_details": {
        "type": "object",
        "enabled": false
      }
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch creates a new index(produce_v2) with the latest mapping. image

If you check the mapping, you will see that the filed "botanical_name" has been typed as text.

View the mapping of produce_v2:

GET produce_v2/_mapping

Expected response from Elasticsearch: image image

STEP 2: Reindex the data from the original index(produce_index) to the one you just created(produce_v2).

POST _reindex
{
  "source": {
    "index": "produce_index"
  },
  "dest": {
    "index": "produce_v2"
  }
}

Expected response from Elasticsearch:

This request reindexes data from the produce_index to the produce_v2 index. The produce_v2 index can now be used to run the requests that the client has specified.

image

Runtime Field

image image image

Step 1: Create a runtime field and add it to the mapping of the existing index.

Syntax:

PUT Enter-name-of-index/_mapping
{
  "runtime": {
    "Name-your-runtime-field-here": {
      "type": "Specify-field-type-here",
      "script": {
        "source": "Specify the formula you want executed"
      }
    }
  }
}

Example:

PUT produce_v2/_mapping
{
  "runtime": {
    "total": {
      "type": "double",
      "script": {
        "source": "emit(doc['unit_price'].value* doc['quantity'].value)"
      }
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch successfully adds the runtime field to the mapping. image

Step 2: Check the mapping:

GET produce_v2/_mapping

Expected response from Elasticsearch:

Elasticsearch adds a runtime field to the mapping(red box).

image

Note that the runtime field is not listed under "properties" object which includes the fields in our documents. This is because the runtime field "total" is not indexed!

Step 3: Run a request on the runtime field to see it perform its magic!

Please note that the following request does not aggregate the monthly expense here. We are running a simple aggregation request to demonstrate how runtime field works!

The following request runs a sum aggregation against the runtime field total of all documents in our index.

Syntax:

GET Enter_name_of_the_index_here/_search
{
  "size": 0,
  "aggs": {
    "Name your aggregations here": {
      "Specify the aggregation type here": {
        "field": "Name the field you want to aggregate on here"
      }
    }
  }
}

Example:

GET produce_v2/_search
{
  "size": 0,
  "aggs": {
    "total_expense": {
      "sum": {
        "field": "total"
      }
    }
  }
}

Expected response from Elasticsearch:

When this request is sent, a runtime field called "total" is created and calculated for documents within the scope of our request(entire index). Then, the sum aggregation is ran on the field "total" over all documents in our index.

image

The runtime field is only created and calculated when a request made on the runtime field is being executed. Runtime fields are not indexed so these do not take up disk space.

We also did not have to reindex in order to add a new field to existing documents. For more information on runtime fields, check out this blog!

Questions from the workshop

Q: If possible please explain the _meta in mapping which was part of previous video.

A: Of course! _meta in mapping this question is referring to workshop part 4.

image

I should have just removed the _meta part before I published this repo. Thank you for submitting a pull request on GitHub @radhakrishnaakamat!

So the _meta field was automatically created by the ml file data visualizer. This is a field where you can store any information regarding the index or the app for developers who are managing it. Think of this field as a place where you can include information regarding the app so developers have info necessary to debug.

The _meta field is optional and deleting the _meta field will not affect the mapping in any way whatsoever. I am going to delete this field in my part 4 workshop repo as it has been causing a lot of confusion!

Q: After you create a new mapping, how do you configure your ingest to use the new mapping?

A: I should have asked for clarification as this question can be interpreted in many different ways.

If I didn't interpret it correctly, please let me know via Twitter @LisaHJung and I will add the answer to this repo!

If you were referring to a situation where you have an old index with outdated mapping that needed to be changed:

Remember, we cannot change the mapping of an existing field. Even if you add a new field to a mapping, it only adds the new field to the list of field names and types. It does not add the new field to documents that have been indexed prior to adding a new field to the mapping.

You must create a new index with the desired mapping, then reindex documents from the old index to the new one, and direct requests to the new index!