1st Hour:

Crush the Cluster, Node, Index, and Shard


Objects:

  • Configure a cluster
  • Run multiple nodes on 1 machine
  • Understand indices, shards and shard replication
  • Understand how Elasticsearch uses Lucene
  • Perform CRUD for indices and documents
  • Understand Optimistic Concurrency Control (_version)
  • Perform GET to retrieve multiple objects in 1 request

What is Elasticsearch?

Elasticsearch is an open-source, readily-scalable, enterprise-grade search engine based on the Lucene text search library. Accessible through an extensive API, Elasticsearch can power extremely fast searches that support your data discovery applications.


What is Kibana?

"Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack, so you can do anything from learning why you're getting paged at 2:00 a.m. to understanding the impact rain might have on your quarterly numbers." - Elastic


Fire up the engines!

# Launch Elasticsearch
$ cd ~/bonobos/elasticsearch
$ bin/elasticsearch

# Launch Kibana
$ cd ~/bonobos/kibana
$ bin/kibana

# Open Kibana in the browser
http://localhost:5601

Memory allocation issues?


It's alive!

GET /_cluster/health

# Compact and Aligned Text - or cat :)
GET _cat/health
GET _cat/shards
GET _cat/nodes

I love you Elasticsearch.

You're perfect.

Now change.

  • ctrl + c to stop Elasticsearch
  • Open config/elasticsearch.yml
  • Change: cluster.name: bonobos
  • $ bin/elasticsearch to start
  • Note - may need to adjust memory for the new cluster

2 Nodes Are Better Than 1

  • Open config/elasticsearch.yml
  • Add: node.max_local_storage_nodes: 2
  • Open new terminal on same machine
  • $ bin/elasticsearch

2+ Nodes on 2+ Machines Is Best!


Concepts:

Cluster, node, shard

Index, type, document, field


Cluster A self-organizing collection of nodes that share data and workload
Node A running instance of Elasticsearch. Nodes with the same name form a cluster.
Shard A single Lucene instance that contains a sliver of the data in the node

generic cluster


specific cluster


structure

"Types" is a version 5 concept NOT a version 6 concept


Dude stop talking!


Create

// Create index
PUT /products
{
   "settings" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 1
   }
}

// Create document
POST /products/_doc/1
{
  "type":"shirt",
  "color":"black"
}
// 5.6
POST /products/shirt/1
{
  "color":"black"
}

Read

//Read one document
GET /products/_doc/1

//Read all documents
GET /products/_search

//Read some documents
GET /products/_search
{
  "query": {
    "match": {
      "type":"shirt"
    }
  }
}

Update

// Update document
POST /products/_doc/1/_update
{
  "doc": {
    "sleeves": "short"
  }
}
// 5.6
POST /products/shirt/1/_update
{
  "sleeves":"short"
}

Delete

// Delete document
DELETE /products/_doc/1

// 5.6
DELETE /products/shirts/123

// Delete index
DELETE /products

Concept:

Optimistic Concurrency Control?

(and why do I care?)

documentation
_version : It's how our shards guarantee data isn't overwritten


Concept:

Single GET for multiple objects? Yep!

GET /products/_mget
{
   "ids" : [ "1", "2" ]
}

Objects:

  • Configure a cluster
  • Run multiple nodes on 1 machine
  • Understand indices, shards and shard replication
  • Understand how Elasticsearch uses Lucene
  • Perform CRUD for indices and documents
  • Understand Optimistic Concurrency Control (_version)
  • Perform GET to retrieve multiple objects in 1 request

Awesome!

Red light, green light?


2nd Hour:

Search This!


Objects:

  • Load JSON datasets from the console
  • Response body searches
  • Perform fuzzy searches
  • Perform searches based on phrases, proximity, wildcards and regular expressions
  • Perform bool queries

_bulk

// Create customers index
PUT /customers
{
   "settings" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0
   }
}

POST _bulk
{ "index" : { "_index" : "customers", "_type" : "_doc", "_id":1} }
{ "name" : "Brian" }
{ "index" : { "_index" : "customers", "_type" : "_doc", "_id":2} }
{ "name" : "Jonathan" }

NDJSON

_bulk uses Newline Delimited JSON


convert csv to json

Remember to replace \n with the index map for each document

load json dataset

$ cp ./data/products.json ./elasticsearch/data/products.json
$ curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/products/_bulk?pretty' --data-binary @data/products.json

Confirm everything loaded

// URI search
GET /products/_search?q=name:golf

Request Body Search


Search Query

// Search 1 field for 1 value
GET /products/_search
{
  "query": {
    "term": {"description": "shirt"}
  }
}

// Search 1 field for 2 values
GET /products/_search
{
  "query": {
    "terms": {"description": ["shirt", "shirts"]}
  }
}

Sort & Size

GET /products/_search
{
  "sort" : [
    { "price" : {"order" : "desc"}}
  ],
  "query": {
    "terms": {"name": ["shirt", "shirts"]}
  }
  "size": 14
}

Range

GET /products/_search
{
  "sort" : [
        { "price" : {"order" : "desc"}}
    ],
    "query": {
        "range" : {
            "price" : {
                "lte" : 200
            }
        }
    }
}

Date, timezome, and more


Filtering

GET /products/_search
{
  "sort" : [
    { "price" : {"order" : "desc"}}
  ],
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "fields" : ["name","description"],
          "query" : "shirt"
        }
      },
      "must_not": {
        "term": {"name": "pants"}
      }
    }
  },
  "size": 20
}

Fuzzy Search

GET /products/_search
{
    "query": {
       "fuzzy" : {"name" : "shorts" }
    }
}

Phrase Search

GET /products/_search
{
    "query": {
       "match_phrase" : {
         "name" : "short shirt"
       }
    }
}

Proximity Search

GET /products/_search
{
    "query": {
       "match_phrase" : {
         "name": {
           "query" : "short shirt",
           "slop": 1
         }
       }
    }
}

Wildcard

GET /products/_search
{
    "query": {
       "wildcard" : {"name" : "sh?rt" }
    }
}

Regex

GET /products/_search
{
    "query": {
       "wildcard" : {"name" : "sh?rt" }
    }
}

Bool

POST /products/_search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "name" : "polo" }
      },
      "filter": {
        "term" : { "color_presentation": "heather" }
      },
      "must_not" : {
        "range" : {
          "prce" : { "gte" : 10, "lte" : 80 }
        }
      },
      "should" : [
        { "term" : { "description": "denim" } },
        { "term" : { "description": "classic" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

Objects:

  • Load JSON datasets from the console
  • Response body searches
  • Perform fuzzy searches
  • Perform searches based on phrases, proximity, wildcards and regular expressions
  • Perform bool queries

3rd Hour:

Complete the Auto-Complete


We will:

  • Launch a React application
  • Load data into Elasticsearch
  • Connect our React application to Elasticsearch
  • Build a product search box with auto-complete functionality

Let's do this!