elasticsearch: A JavaScript repository from marianmosley

1st Hour:

Crush the Cluster, Node, Index, and Shard

Objects:

Configure a cluster
Run multiple nodes on 1 machine
Understand indices, shards and shard replication
Understand how Elasticsearch uses Lucene
Perform CRUD for indices and documents
Understand Optimistic Concurrency Control (_version)
Perform GET to retrieve multiple objects in 1 request

What is Elasticsearch?

Elasticsearch is an open-source, readily-scalable, enterprise-grade search engine based on the Lucene text search library. Accessible through an extensive API, Elasticsearch can power extremely fast searches that support your data discovery applications.

What is Kibana?

"Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack, so you can do anything from learning why you're getting paged at 2:00 a.m. to understanding the impact rain might have on your quarterly numbers." - Elastic

Fire up the engines!

# Launch Elasticsearch
$ cd ~/bonobos/elasticsearch
$ bin/elasticsearch

# Launch Kibana
$ cd ~/bonobos/kibana
$ bin/kibana

# Open Kibana in the browser
http://localhost:5601

Memory allocation issues?

It's alive!

GET /_cluster/health

# Compact and Aligned Text - or cat :)
GET _cat/health
GET _cat/shards
GET _cat/nodes

I love you Elasticsearch.

You're perfect.

Now change.

ctrl + c to stop Elasticsearch
Open config/elasticsearch.yml
Change: cluster.name: bonobos
$ bin/elasticsearch to start
Note - may need to adjust memory for the new cluster

2 Nodes Are Better Than 1

Open config/elasticsearch.yml
Add: node.max_local_storage_nodes: 2
Open new terminal on same machine
$ bin/elasticsearch

2+ Nodes on 2+ Machines Is Best!

Concepts:

Cluster, node, shard

Index, type, document, field


Cluster	A self-organizing collection of nodes that share data and workload
Node	A running instance of Elasticsearch. Nodes with the same name form a cluster.
Shard	A single Lucene instance that contains a sliver of the data in the node

"Types" is a version 5 concept NOT a version 6 concept

Dude stop talking!

Create

// Create index
PUT /products
{
   "settings" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 1
   }
}

// Create document
POST /products/_doc/1
{
  "type":"shirt",
  "color":"black"
}
// 5.6
POST /products/shirt/1
{
  "color":"black"
}

Read

//Read one document
GET /products/_doc/1

//Read all documents
GET /products/_search

//Read some documents
GET /products/_search
{
  "query": {
    "match": {
      "type":"shirt"
    }
  }
}

Update

// Update document
POST /products/_doc/1/_update
{
  "doc": {
    "sleeves": "short"
  }
}
// 5.6
POST /products/shirt/1/_update
{
  "sleeves":"short"
}

Delete

// Delete document
DELETE /products/_doc/1

// 5.6
DELETE /products/shirts/123

// Delete index
DELETE /products

Concept:

Optimistic Concurrency Control?

(and why do I care?)

documentation
_version : It's how our shards guarantee data isn't overwritten

Concept:

Single GET for multiple objects? Yep!

GET /products/_mget
{
   "ids" : [ "1", "2" ]
}

Objects:

Configure a cluster
Run multiple nodes on 1 machine
Understand indices, shards and shard replication
Understand how Elasticsearch uses Lucene
Perform CRUD for indices and documents
Understand Optimistic Concurrency Control (_version)
Perform GET to retrieve multiple objects in 1 request

Awesome!

Red light, green light?

2nd Hour:

Search This!

Objects:

Load JSON datasets from the console
Response body searches
Perform fuzzy searches
Perform searches based on phrases, proximity, wildcards and regular expressions
Perform bool queries

_bulk

// Create customers index
PUT /customers
{
   "settings" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0
   }
}

POST _bulk
{ "index" : { "_index" : "customers", "_type" : "_doc", "_id":1} }
{ "name" : "Brian" }
{ "index" : { "_index" : "customers", "_type" : "_doc", "_id":2} }
{ "name" : "Jonathan" }

NDJSON

_bulk uses Newline Delimited JSON

convert csv to json

Remember to replace \n with the index map for each document

load json dataset

$ cp ./data/products.json ./elasticsearch/data/products.json
$ curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/products/_bulk?pretty' --data-binary @data/products.json

Confirm everything loaded

// URI search
GET /products/_search?q=name:golf

Request Body Search

Search Query

// Search 1 field for 1 value
GET /products/_search
{
  "query": {
    "term": {"description": "shirt"}
  }
}

// Search 1 field for 2 values
GET /products/_search
{
  "query": {
    "terms": {"description": ["shirt", "shirts"]}
  }
}

Sort & Size

GET /products/_search
{
  "sort" : [
    { "price" : {"order" : "desc"}}
  ],
  "query": {
    "terms": {"name": ["shirt", "shirts"]}
  }
  "size": 14
}

Range

GET /products/_search
{
  "sort" : [
        { "price" : {"order" : "desc"}}
    ],
    "query": {
        "range" : {
            "price" : {
                "lte" : 200
            }
        }
    }
}

Date, timezome, and more

Filtering

GET /products/_search
{
  "sort" : [
    { "price" : {"order" : "desc"}}
  ],
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "fields" : ["name","description"],
          "query" : "shirt"
        }
      },
      "must_not": {
        "term": {"name": "pants"}
      }
    }
  },
  "size": 20
}

Fuzzy Search

GET /products/_search
{
    "query": {
       "fuzzy" : {"name" : "shorts" }
    }
}

Phrase Search

GET /products/_search
{
    "query": {
       "match_phrase" : {
         "name" : "short shirt"
       }
    }
}

Proximity Search

GET /products/_search
{
    "query": {
       "match_phrase" : {
         "name": {
           "query" : "short shirt",
           "slop": 1
         }
       }
    }
}

Wildcard

GET /products/_search
{
    "query": {
       "wildcard" : {"name" : "sh?rt" }
    }
}

Regex

GET /products/_search
{
    "query": {
       "wildcard" : {"name" : "sh?rt" }
    }
}

Bool

POST /products/_search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "name" : "polo" }
      },
      "filter": {
        "term" : { "color_presentation": "heather" }
      },
      "must_not" : {
        "range" : {
          "prce" : { "gte" : 10, "lte" : 80 }
        }
      },
      "should" : [
        { "term" : { "description": "denim" } },
        { "term" : { "description": "classic" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

Objects:

Load JSON datasets from the console
Response body searches
Perform fuzzy searches
Perform searches based on phrases, proximity, wildcards and regular expressions
Perform bool queries

marianmosley/elasticsearch

1st Hour:

Crush the Cluster, Node, Index, and Shard

Objects:

What is Elasticsearch?

What is Kibana?

Fire up the engines!

Memory allocation issues?

It's alive!

I love you Elasticsearch.

You're perfect.

Now change.

2 Nodes Are Better Than 1

2+ Nodes on 2+ Machines Is Best!

Concepts:

Cluster, node, shard

Index, type, document, field

Dude stop talking!

Create

Read

Update

Delete

Concept:

Optimistic Concurrency Control?

(and why do I care?)

Concept:

Single GET for multiple objects? Yep!

Objects:

Awesome!

2nd Hour:

Search This!

Objects:

NDJSON

convert csv to json

load json dataset

Confirm everything loaded

Request Body Search

Search Query

Sort & Size

Range

Filtering

Fuzzy Search

Phrase Search

Proximity Search

Wildcard

Regex

Bool

Objects:

3rd Hour:

Complete the Auto-Complete

We will:

Let's do this!