Welcome to the Beginner's Crash Course to Elastic Stack!
This repo contains all resources shared during Part 5: Understanding Mapping with Elasticsearch and Kibana.
Have you ever encountered the error “Field type is not supported for [whatever you are trying to do with Elasticsearch]”?
The most likely culprit of this error is the mapping of your index!
Mapping is the process of defining how a document and its fields are indexed and stored. It defines the type and format of the fields in the documents. As a result, mapping can significantly affect how Elasticsearch searches and stores data.
Understanding how mapping works will help you define mapping that best serves your use case.
By the end of this workshop, you will be able to define what a mapping is and define your own mapping to make indexing and searching more efficient.
Table of Contents: Beginner's Crash Course to Elastic Stack:
This workshop is a part of the Beginner's Crash Course to Elastic Stack series. Check out this table contents to access all the workshops in the series thus far. This table will continue to get updated as more workshops in the series are released!
Instructions on how to access Elasticsearch and Kibana on Elastic Cloud
Instructions for downloading Elasticsearch and Kibana
Video recording of the workshop
Mini Beginner's Crash Course to Elasticsearch & Kibana playlist
Do you prefer learning by watching shorter videos? Check out this playlist to watch short clips of beginner's crash course full length workshops. Part 5 workshop is broken down into episodes 19-22. Season 2 clips will be uploaded here in the future!
What's next? Eager to continue your learning after mastering the concept from this workshop? Move on to Part 6: Troubleshooting Beginner Level Elasticsearch Errors here!
The following request will index the following document.
Syntax:
POST Enter-name-of-the-index/_doc
{
"field": "value"
}
Example:
POST temp_index/_doc
{
"name": "Pineapple",
"botanical_name": "Ananas comosus",
"produce_type": "Fruit",
"country_of_origin": "New Zealand",
"date_purchased": "2020-06-02T12:15:35",
"quantity": 200,
"unit_price": 3.11,
"description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
"vendor_details": {
"vendor": "Tropical Fruit Growers of New Zealand",
"main_contact": "Hugh Rose",
"vendor_location": "Whangarei, New Zealand",
"preferred_vendor": true
}
}
Expected response from Elasticsearch:
Elasticsearch will confirm that this document has been successfully indexed into the temp_index.
Mapping determines how a document and its fields are indexed and stored by defining the type of each field.
It contains a list of the names and types of fields in an index. Depending on its type, each field is indexed and stored differently in Elasticsearch.
When a user does not define mapping in advance, Elasticsearch creates or updates the mapping as needed by default. This is known as dynamic mapping
.
With dynamic mapping
, Elasticsearch looks at each field and tries to infer the data type from the field content. Then, it assigns a type to each field and creates a list of field names and types known as mapping.
Depending on the assigned field type, each field is indexed and primed for different types of requests(full text search, aggregations, sorting). This is why mapping plays an important role in how Elasticsearch stores and searches for data.
Syntax:
GET Enter_name_of_the_index_here/_mapping
Example:
GET temp_index/_mapping
Expected response from Elasticsearch:
Elasticsearch returns the mapping of the temp_index. It lists all the fields of the document in an alphabetical order and lists the type of each field(text, keyword, long, float, date, boolean and etc).
For the list of all field types, click here!
There are two kinds of string field types:
- Text
- Keyword
By default, every string gets mapped twice as a text field and as a keyword multi-field. Each field type is primed for different types of requests.
Text
field type is designed for full-text searches.
Keyword
field type is designed for exact searches, aggregations, and sorting.
You can customize your mapping by assigning the field type as either text or keyword or both!
Ever notice that when you search in Elasticsearch, it is not case sensitive or punctuation does not seem to matter? This is because text analysis
occurs when your fields are indexed.
By default, strings are analyzed when it is indexed. The string is broken up into individual words also known as tokens. The analyzer further lowercases each token and removes punctuations.
Inverted Index
Once the string is analyzed, the individual tokens are stored in a sorted list known as the inverted index
. Each unique token is stored in the inverted index
with its associated ID.
The same process occurs every time you index a new document.
Keyword
field type is used for aggregations, sorting, and exact searches. These actions look up the document ID to find the values it has in its fields.
Keyword
field is suited to perform these actions because it uses a data structure called doc values
to store data.
For each document, the document id along with the field value(original string) are added to the table. This data structure(doc values
) is designed for actions that require looking up the document ID to find the values it has in its fields.
When Elasticsearch dynamically creates a mapping for you, it does not know what you want to use a string for so it maps all strings to both field types.
In cases where you do not need both field types, the default setting is wasteful. Since both field types require creating either an inverted index or doc values, creating both field types for unnecessary fields will slow down indexing and take up more disk space.
This is why we define our own mapping as it helps us store and search data more efficiently.
Project: Build an app for a client who manages a produce warehouse
This app must enable users to:
-
search for produce name, country of origin and description
-
identify top countries of origin with the most frequent purchase history
-
sort produce by produce type(Fruit or Vegetable)
-
get the summary of monthly expense
Sample data
{
"name": "Pineapple",
"botanical_name": "Ananas comosus",
"produce_type": "Fruit",
"country_of_origin": "New Zealand",
"date_purchased": "2020-06-02T12:15:35",
"quantity": 200,
"unit_price": 3.11,
"description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
"vendor_details": {
"vendor": "Tropical Fruit Growers of New Zealand",
"main_contact": "Hugh Rose",
"vendor_location": "Whangarei, New Zealand",
"preferred_vendor": true
}
}
Rules
- If you do not define a mapping ahead of time, Elasticsearch dynamically creates the mapping for you.
- If you do decide to define your own mapping, you can do so at index creation.
- ONE mapping is defined per index. Once the index has been created, we can only add new fields to a mapping. We CANNOT change the mapping of an existing field.
- If you must change the type of an existing field, you must create a new index with the desired mapping, then reindex all documents into the new index.
Step 1: Index a sample document into a test index.
The sample document must contain the fields that you want to define. These fields must also contain values that map closely to the field types you want.
Syntax:
POST Name-of-test-index/_doc
{
"field": "value"
}
Example:
POST test_index/_doc
{
"name": "Pineapple",
"botanical_name": "Ananas comosus",
"produce_type": "Fruit",
"country_of_origin": "New Zealand",
"date_purchased": "2020-06-02T12:15:35",
"quantity": 200,
"unit_price": 3.11,
"description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
"vendor_details": {
"vendor": "Tropical Fruit Growers of New Zealand",
"main_contact": "Hugh Rose",
"vendor_location": "Whangarei, New Zealand",
"preferred_vendor": true
}
}
Expected response from Elasticsearch:
The test_index is successfully created.
Step 2: View the dynamic mapping
Syntax:
GET Name-the-index-whose-mapping-you-want-to-view/_mapping
Example:
GET test_index/_mapping
Expected response from Elasticsearch:
Elasticsearch will display the mapping it has created. It lists the fields in an alphabetical order. This document is identical to the one we indexed into the temp_index. To save space, the screenshots of the mapping has not been included here.
Step 3: Edit the mapping
Copy and paste the mapping from step 2 into the Kibana console. From the pasted results, remove the "test_index" along with its opening and closing brackets. Then, edit the mapping to satisfy the requirements outlined in the figure below.
The optimized mapping should look like the following:
{
"mappings": {
"properties": {
"botanical_name": {
"enabled": false
},
"country_of_origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"date_purchased": {
"type": "date"
},
"description": {
"type": "text"
},
"name": {
"type": "text"
},
"produce_type": {
"type": "keyword"
},
"quantity": {
"type": "long"
},
"unit_price": {
"type": "float"
},
"vendor_details": {
"enabled": false
}
}
}
}
Step 4: Create a new index with the optimized mapping from step 3.
Syntax:
PUT Name-of-your-final-index
{
copy and paste your edited mapping here
}
Example:
PUT produce_index
{
"mappings": {
"properties": {
"botanical_name": {
"enabled": false
},
"country_of_origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"date_purchased": {
"type": "date"
},
"description": {
"type": "text"
},
"name": {
"type": "text"
},
"produce_type": {
"type": "keyword"
},
"quantity": {
"type": "long"
},
"unit_price": {
"type": "float"
},
"vendor_details": {
"enabled": false
}
}
}
}
Expected response from Elasticsearch:
Elasticsearch creates a produce_index with the customized mapping we defined above!
Step 5: Check the mapping of the new index to make sure the all the fields have been mapped correctly
Syntax:
GET Name-of-test-index/_mapping
Example:
GET produce_index/_mapping
Expected response from Elasticsearch:
Compared to the dynamic mapping
, our optimized mapping looks more simple and concise! The current mapping satisfies the requirements that are marked with green check marks.
Step 6: Index your dataset into the new index
For simplicity's sake, we will index two documents.
Index the first document
POST produce_index/_doc
{
"name": "Pineapple",
"botanical_name": "Ananas comosus",
"produce_type": "Fruit",
"country_of_origin": "New Zealand",
"date_purchased": "2020-06-02T12:15:35",
"quantity": 200,
"unit_price": 3.11,
"description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
"vendor_details": {
"vendor": "Tropical Fruit Growers of New Zealand",
"main_contact": "Hugh Rose",
"vendor_location": "Whangarei, New Zealand",
"preferred_vendor": true
}
}
Expected response from Elasticsearch:
Elasticsearch successfully indexes the first document.
Index the second document
The second document has almost identical fields as the first document except that it has an extra field called "organic" set to true!
POST produce_index/_doc
{
"name": "Mango",
"botanical_name": "Harum Manis",
"produce_type": "Fruit",
"country_of_origin": "Indonesia",
"organic": true,
"date_purchased": "2020-05-02T07:15:35",
"quantity": 500,
"unit_price": 1.5,
"description": "Mango Arumanis or Harum Manis is originated from East Java. Arumanis means harum dan manis or fragrant and sweet just like its taste. The ripe Mango Arumanis has dark green skin coated with thin grayish natural wax. The flesh is deep yellow, thick, and soft with little to no fiber. Mango Arumanis is best eaten when ripe.",
"vendor_details": {
"vendor": "Ayra Shezan Trading",
"main_contact": "Suharto",
"vendor_location": "Binjai, Indonesia",
"preferred_vendor": true
}
}
Expected response from Elasticsearch:
Elasticsearch successfully indexes the second document.
Let's see what happens to the mapping by sending this request below:
GET produce_index/_mapping
Expected response from Elasticsearch:
The new field("organic") and its field type(boolean) have been added to the mapping. This is in line with the rules of mapping we discussed earlier since you can add new fields to the mapping. We just cannot change the mapping of an existing field!
Let's say your client changed his mind. He wants to run only full text search on the field "botanical_name" we disabled earlier.
Remember, you CANNOT change the mapping of an existing field. If you do need to make changes to an existing field, you must create a new index with the desired mapping, then reindex all documents into the new index.
STEP 1: Create a new index(produce_v2) with the latest mapping.
We removed the "enabled" parameter from the field "botanical_name" and changed its type to "text".
Example:
PUT produce_v2
{
"mappings": {
"properties": {
"botanical_name": {
"type": "text"
},
"country_of_origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"date_purchased": {
"type": "date"
},
"description": {
"type": "text"
},
"name": {
"type": "text"
},
"organic": {
"type": "boolean"
},
"produce_type": {
"type": "keyword"
},
"quantity": {
"type": "long"
},
"unit_price": {
"type": "float"
},
"vendor_details": {
"type": "object",
"enabled": false
}
}
}
}
Expected response from Elasticsearch:
Elasticsearch creates a new index(produce_v2) with the latest mapping.
If you check the mapping, you will see that the filed "botanical_name" has been typed as text.
View the mapping of produce_v2:
GET produce_v2/_mapping
Expected response from Elasticsearch:
STEP 2: Reindex the data from the original index(produce_index) to the one you just created(produce_v2).
POST _reindex
{
"source": {
"index": "produce_index"
},
"dest": {
"index": "produce_v2"
}
}
Expected response from Elasticsearch:
This request reindexes data from the produce_index to the produce_v2 index. The produce_v2 index can now be used to run the requests that the client has specified.
Step 1: Create a runtime field
and add it to the mapping of the existing index.
Syntax:
PUT Enter-name-of-index/_mapping
{
"runtime": {
"Name-your-runtime-field-here": {
"type": "Specify-field-type-here",
"script": {
"source": "Specify the formula you want executed"
}
}
}
}
Example:
PUT produce_v2/_mapping
{
"runtime": {
"total": {
"type": "double",
"script": {
"source": "emit(doc['unit_price'].value* doc['quantity'].value)"
}
}
}
}
Expected response from Elasticsearch:
Elasticsearch successfully adds the runtime field
to the mapping.
Step 2: Check the mapping:
GET produce_v2/_mapping
Expected response from Elasticsearch:
Elasticsearch adds a runtime field
to the mapping(red box).
Note that the runtime field
is not listed under "properties" object which includes the fields in our documents. This is because the runtime field
"total" is not indexed!
Step 3: Run a request on the runtime field
to see it perform its magic!
Please note that the following request does not aggregate the monthly expense here. We are running a simple aggregation request to demonstrate how runtime field
works!
The following request runs a sum aggregation against the runtime field
total of all documents in our index.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"Specify the aggregation type here": {
"field": "Name the field you want to aggregate on here"
}
}
}
}
Example:
GET produce_v2/_search
{
"size": 0,
"aggs": {
"total_expense": {
"sum": {
"field": "total"
}
}
}
}
Expected response from Elasticsearch:
When this request is sent, a runtime field
called "total" is created and calculated for documents within the scope of our request(entire index). Then, the sum aggregation is ran on the field "total" over all documents in our index.
The runtime field
is only created and calculated when a request made on the runtime field
is being executed. Runtime fields
are not indexed so these do not take up disk space.
We also did not have to reindex in order to add a new field to existing documents. For more information on runtime fields, check out this blog!
Q: If possible please explain the _meta in mapping which was part of previous video.
A: Of course! _meta in mapping this question is referring to workshop part 4.
I should have just removed the _meta part before I published this repo. Thank you for submitting a pull request on GitHub @radhakrishnaakamat!
So the _meta field was automatically created by the ml file data visualizer. This is a field where you can store any information regarding the index or the app for developers who are managing it. Think of this field as a place where you can include information regarding the app so developers have info necessary to debug.
The _meta field is optional and deleting the _meta field will not affect the mapping in any way whatsoever. I am going to delete this field in my part 4 workshop repo as it has been causing a lot of confusion!
Q: After you create a new mapping, how do you configure your ingest to use the new mapping?
A: I should have asked for clarification as this question can be interpreted in many different ways.
If I didn't interpret it correctly, please let me know via Twitter @LisaHJung and I will add the answer to this repo!
If you were referring to a situation where you have an old index with outdated mapping that needed to be changed:
Remember, we cannot change the mapping of an existing field. Even if you add a new field to a mapping, it only adds the new field to the list of field names and types. It does not add the new field to documents that have been indexed prior to adding a new field to the mapping.
You must create a new index with the desired mapping, then reindex documents from the old index to the new one, and direct requests to the new index!