This project applies ELS to make a nearline tier of the recommention system.
All configuration and binary of Elasticsearch, Logstash and Kibana are placed here.
- Java runtime (JRE/JDK) 1.8+
apt-get install openjdk-8-jdk
, on ubuntu 14.04 LTS or above
most of following instructions references from Important System Configuration
- /etc/security/limits.conf
- ubuntu and
limits.conf
- to enable the
limits.conf
file, uncomment the following line#session required pam_limits.so
=>session required pam_limits.so
in/etc/pam.d/su
- to enable the
- set the maximum number of open files for the elasticsearch user, e.g. elk
- grant elasticsearch user permission to lock memory
- make sure that the number of threads that the Elasticsearch user (elk) can create is at least 2048
- ubuntu and
# /etc/security/limits.conf
elk - nofile 65536
elk - memlock unlimited
elk - nproc 2048
- mmapfs (reboot required)
- increase the limits of mmap count, and run
sysctl vm.max_map_count
to verify
- increase the limits of mmap count, and run
# /etc/sysctl.conf
vm.max_map_count=262144
- download and extract from Github, i.e. click Download ZIP
- enter CLI mode and change working dir to
elasticsearch\
- enlarge heap size to performance (optional, reserves at least half memory to OS)
- start elasticsearch instance
bin/elasticsearch
(bin/elasticsearch -d
demon mode) if linux like OS
- browse
http://localhost:9200
and check whether the response message looks as below.
{"status" : 200, "name" : "Thing", "cluster_name" : "elasticsearch", ... }
- install Chrome extension ElasticSearch Head
- check
http://localhost:9200/_plugin/head/
for managemnet console.
see elasticsearch-head for detail. - (optional) System call filter check, seccomp
seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed ...
Some OS which does not supportCONFIG_SECCOMP
, please setbootstrap.system_call_filter: false
intoelasticsearch.yaml
.
For more info, see elastic.
Data in ES is able to be access by the restful API, e.g. http://localhost:9200/{index}/{type}/{id}
.
The concept of the data structure relationship regarding {index}, {type} and {id} in ES could be thought as following picture.
The structure of indices and typies with respect to venraas and customers illustrates with following image.
Each customer (tenant) consists of 4 indices (DB) for different purposes.
- {cust}_gocc - batch data of Goods, Order, Category (and GoodsCateCode), Customer
- {cust}_mod - recom'd models
- {cust}_oua - online user alignment
- {cust}_opp - online preference pool
An index creation can be performed using a restful api request.
For example,
if our customer is titled goshopping
, then the creation requests of the 4 indices look like follows.
-
POST http://localhost:9200/goshopping_gocc/
-
POST http://localhost:9200/goshopping_mod/
-
POST http://localhost:9200/goshopping_oua/
-
POST http://localhost:9200/goshopping_opp/
For each request, Elasticsearch responds as follows if the index has been created successfully.
{
"acknowledged": true
}
venraas index creation:
POST
http://localhost:9200/venraas
The AAA info is available under
http://localhost:9200/venraas/action
stores an action content per record.http://localhost:9200/venraas/package
stores all package information per record.http://localhost:9200/venraas/company
stores all company information per record.
sync info of AAA:
POST
http://localhost:9200/venraas/action
{
"updateTime":"2015-10-08 17:26:01",
"account":"eric@goshopping.com",
"accountId":1,
"actionType":"新增公司",
"actionContent":{
"companyId":19,
"companyName":"goshopping",
"codename":"goshopping",
"domainName":"www.goshopping.com",
"companyEnabled":true,
"offlinemodelstatus":true,
"token":"5Lc57e8WpD3",
"userAdminId":163,
"venraasptstatus":true,
"packageId":68
}
}
POST
http://localhost:9200/venraas/company
{
"updateTime":"2015-10-08 17:26:01",
"companies":[
{
"companyId":1,
"domainName":"www.goshopping.com",
"companyName":"goshopping",
"companyEnabled":true,
"codename":"goshopping",
"token":"5Lc57e8WpD3",
"offlinemodelstatus":true,
"venraasptstatus":true,
"package":{
"packageId":69,
"packageName":"B套餐",
"packageEnabled":true,
"apiRequestMax":1000,
"userAccountMax":2,
"fee":0,
"extraImpressionFee":0
}
}
]
}
We stores daily recom'd API request counting information under {cust}_bill/api_count.
POST
http://localhost:9200/gohappy_bill/api_count
{
"page_type": "gop",
"fea_id": "b01",
"count": 1,
"update_time": "2015-09-02 13:15:10"
}
- Query the histogram of hourly api request count
POST
http://localhost:9200/gohappy_bill/api_count/_search
{
"size": 0,
"query": {
"query_string": {
"query": "*"
}
},
"aggs": {
"name_histo": {
"date_histogram": {
"field": "update_time",
"interval": "1h"
},
"aggs": {
"name_cnt": {
"sum": {
"field": "count"
}
}
}
}
}
}
- Query the histogram of hourly api request count with the specified page type, i.e. Category
POST
http://localhost:9200/gohappy_bill/api_count/_search
{
"size": 0,
"query": {
"query_string": {
"query": "page_type:cap"
}
},
"aggs": {
"name_histo": {
"date_histogram": {
"field": "update_time",
"interval": "1h"
},
"aggs": {
"name_cnt": {
"sum": {
"field": "count"
}
}
}
}
}
}
- Query the histogram grouped by $index and $page_type for daily api request count within a specific duration, i.e. Category
POST
http://localhost:9200/*_bill/api_count/
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "_type:api_count AND update_time:[2015-10-01 TO 2015-10-6]",
"analyze_wildcard": true
}
}
}
},
"aggs": {
"group_by_index": {
"terms": {
"field": "_index"
},
"aggs": {
"group_by_pageType": {
"terms": {
"field": "page_type"
},
"aggs": {
"day_sum": {
"date_histogram": {
"field": "update_time",
"interval": "1d"
},
"aggs": {
"sum_count": {
"sum": {
"field": "count"
}
}
}
}
}
}
}
}
}
}
- Query the overall histogram of hourly api request count
POST http://localhost:9200/*/api_count/_search
cd logstash
to switch working dirtar -xvzf logstash-${version}.tar.gz
to unpack packagerm ~/.since*
to reset sync cursor- (optional, package updated)
./logstash-1.5.4/bin/plugin update logstash-input-lumberjack
update lumberjack plugin.(Ref: elk/wiki/Logstash-Configuration-for-Lumberjack-Input) ./logstash-1.5.4/bin/logstash -f conf/weblog.conf
to instance a logstash for weblog
nohup ./logstash-1.5.4/bin/logstash -f conf/weblog.conf > /dev/null 2>&1 &
run in backgroud and nohup