1st Hour:
Crush the Cluster, Node, Index, and Shard
Objects:
- Configure a cluster
- Run multiple nodes on 1 machine
- Understand indices, shards and shard replication
- Understand how Elasticsearch uses Lucene
- Perform CRUD for indices and documents
- Understand Optimistic Concurrency Control (_version)
- Perform GET to retrieve multiple objects in 1 request
What is Elasticsearch?
Elasticsearch is an open-source, readily-scalable, enterprise-grade search engine based on the Lucene text search library. Accessible through an extensive API, Elasticsearch can power extremely fast searches that support your data discovery applications.
What is Kibana?
"Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack, so you can do anything from learning why you're getting paged at 2:00 a.m. to understanding the impact rain might have on your quarterly numbers." - Elastic
Fire up the engines!
# Launch Elasticsearch
$ cd ~/bonobos/elasticsearch
$ bin/elasticsearch
# Launch Kibana
$ cd ~/bonobos/kibana
$ bin/kibana
# Open Kibana in the browser
http://localhost:5601
Memory allocation issues?
It's alive!
GET /_cluster/health
# Compact and Aligned Text - or cat :)
GET _cat/health
GET _cat/shards
GET _cat/nodes
I love you Elasticsearch.
You're perfect.
Now change.
- ctrl + c to stop Elasticsearch
- Open config/elasticsearch.yml
- Change: cluster.name: bonobos
- $ bin/elasticsearch to start
- Note - may need to adjust memory for the new cluster
2 Nodes Are Better Than 1
- Open config/elasticsearch.yml
- Add: node.max_local_storage_nodes: 2
- Open new terminal on same machine
- $ bin/elasticsearch
2+ Nodes on 2+ Machines Is Best!
Concepts:
Cluster, node, shard
Index, type, document, field
Cluster | A self-organizing collection of nodes that share data and workload |
Node | A running instance of Elasticsearch. Nodes with the same name form a cluster. |
Shard | A single Lucene instance that contains a sliver of the data in the node |
"Types" is a version 5 concept NOT a version 6 concept
Dude stop talking!
Create
// Create index
PUT /products
{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
// Create document
POST /products/_doc/1
{
"type":"shirt",
"color":"black"
}
// 5.6
POST /products/shirt/1
{
"color":"black"
}
Read
//Read one document
GET /products/_doc/1
//Read all documents
GET /products/_search
//Read some documents
GET /products/_search
{
"query": {
"match": {
"type":"shirt"
}
}
}
Update
// Update document
POST /products/_doc/1/_update
{
"doc": {
"sleeves": "short"
}
}
// 5.6
POST /products/shirt/1/_update
{
"sleeves":"short"
}
Delete
// Delete document
DELETE /products/_doc/1
// 5.6
DELETE /products/shirts/123
// Delete index
DELETE /products
Concept:
Optimistic Concurrency Control?
(and why do I care?)
documentation
_version : It's how our shards guarantee data isn't overwritten
Concept:
Single GET for multiple objects? Yep!
GET /products/_mget
{
"ids" : [ "1", "2" ]
}
Objects:
- Configure a cluster
- Run multiple nodes on 1 machine
- Understand indices, shards and shard replication
- Understand how Elasticsearch uses Lucene
- Perform CRUD for indices and documents
- Understand Optimistic Concurrency Control (_version)
- Perform GET to retrieve multiple objects in 1 request
Awesome!
Red light, green light?
2nd Hour:
Search This!
Objects:
- Load JSON datasets from the console
- Response body searches
- Perform fuzzy searches
- Perform searches based on phrases, proximity, wildcards and regular expressions
- Perform bool queries
// Create customers index
PUT /customers
{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
POST _bulk
{ "index" : { "_index" : "customers", "_type" : "_doc", "_id":1} }
{ "name" : "Brian" }
{ "index" : { "_index" : "customers", "_type" : "_doc", "_id":2} }
{ "name" : "Jonathan" }
NDJSON
_bulk uses Newline Delimited JSON
convert csv to json
Remember to replace \n with the index map for each document
load json dataset
$ cp ./data/products.json ./elasticsearch/data/products.json
$ curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/products/_bulk?pretty' --data-binary @data/products.json
Confirm everything loaded
// URI search
GET /products/_search?q=name:golf
Request Body Search
Search Query
// Search 1 field for 1 value
GET /products/_search
{
"query": {
"term": {"description": "shirt"}
}
}
// Search 1 field for 2 values
GET /products/_search
{
"query": {
"terms": {"description": ["shirt", "shirts"]}
}
}
Sort & Size
GET /products/_search
{
"sort" : [
{ "price" : {"order" : "desc"}}
],
"query": {
"terms": {"name": ["shirt", "shirts"]}
}
"size": 14
}
Range
GET /products/_search
{
"sort" : [
{ "price" : {"order" : "desc"}}
],
"query": {
"range" : {
"price" : {
"lte" : 200
}
}
}
}
Filtering
GET /products/_search
{
"sort" : [
{ "price" : {"order" : "desc"}}
],
"query": {
"bool": {
"must": {
"query_string": {
"fields" : ["name","description"],
"query" : "shirt"
}
},
"must_not": {
"term": {"name": "pants"}
}
}
},
"size": 20
}
Fuzzy Search
GET /products/_search
{
"query": {
"fuzzy" : {"name" : "shorts" }
}
}
Phrase Search
GET /products/_search
{
"query": {
"match_phrase" : {
"name" : "short shirt"
}
}
}
Proximity Search
GET /products/_search
{
"query": {
"match_phrase" : {
"name": {
"query" : "short shirt",
"slop": 1
}
}
}
}
Wildcard
GET /products/_search
{
"query": {
"wildcard" : {"name" : "sh?rt" }
}
}
Regex
GET /products/_search
{
"query": {
"wildcard" : {"name" : "sh?rt" }
}
}
Bool
POST /products/_search
{
"query": {
"bool" : {
"must" : {
"term" : { "name" : "polo" }
},
"filter": {
"term" : { "color_presentation": "heather" }
},
"must_not" : {
"range" : {
"prce" : { "gte" : 10, "lte" : 80 }
}
},
"should" : [
{ "term" : { "description": "denim" } },
{ "term" : { "description": "classic" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
}
Objects:
- Load JSON datasets from the console
- Response body searches
- Perform fuzzy searches
- Perform searches based on phrases, proximity, wildcards and regular expressions
- Perform bool queries
3rd Hour:
Complete the Auto-Complete
We will:
- Launch a React application
- Load data into Elasticsearch
- Connect our React application to Elasticsearch
- Build a product search box with auto-complete functionality