/elk

Primary LanguageBatchfile

TOC

Introduction

This project applies ELS to make a nearline tier of the recommention system.
All configuration and binary of Elasticsearch, Logstash and Kibana are placed here.

Prerequisite

  • Java runtime (JRE/JDK) 1.8+
    apt-get install openjdk-8-jdk, on ubuntu 14.04 LTS or above

Elasticsearch

System Configuration

most of following instructions references from Important System Configuration

  • /etc/security/limits.conf
    • ubuntu and limits.conf
      • to enable the limits.conf file, uncomment the following line #session required pam_limits.so => session required pam_limits.so in /etc/pam.d/su
    • set the maximum number of open files for the elasticsearch user, e.g. elk
    • grant elasticsearch user permission to lock memory
    • make sure that the number of threads that the Elasticsearch user (elk) can create is at least 2048
# /etc/security/limits.conf
elk    -    nofile       65536                                                                                                     
elk    -    memlock      unlimited                                                                                                 
elk    -    nproc        2048                                                                                                      
  • mmapfs (reboot required)
    • increase the limits of mmap count, and run sysctl vm.max_map_count to verify
# /etc/sysctl.conf
vm.max_map_count=262144

Installation

  1. download and extract from Github, i.e. click Download ZIP
  2. enter CLI mode and change working dir to elasticsearch\
  3. enlarge heap size to performance (optional, reserves at least half memory to OS)
  4. start elasticsearch instance
    • bin/elasticsearch (bin/elasticsearch -d demon mode) if linux like OS
  5. browse http://localhost:9200 and check whether the response message looks as below.
    {"status" : 200, "name" : "Thing", "cluster_name" : "elasticsearch", ... }
  6. install Chrome extension ElasticSearch Head
  7. check http://localhost:9200/_plugin/head/ for managemnet console.
    see elasticsearch-head for detail.
  8. (optional) System call filter check, seccomp
    seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed ...
    Some OS which does not support CONFIG_SECCOMP, please set bootstrap.system_call_filter: false into elasticsearch.yaml.

For more info, see elastic.

Preliminary

Data in ES is able to be access by the restful API, e.g. http://localhost:9200/{index}/{type}/{id}.
The concept of the data structure relationship regarding {index}, {type} and {id} in ES could be thought as following picture.

Create indices

Overview

The structure of indices and typies with respect to venraas and customers illustrates with following image.

Customer data structure

Each customer (tenant) consists of 4 indices (DB) for different purposes.

  • {cust}_gocc - batch data of Goods, Order, Category (and GoodsCateCode), Customer
  • {cust}_mod - recom'd models
  • {cust}_oua - online user alignment
  • {cust}_opp - online preference pool

An index creation can be performed using a restful api request.
For example, if our customer is titled goshopping, then the creation requests of the 4 indices look like follows.

  • POST http://localhost:9200/goshopping_gocc/

  • POST http://localhost:9200/goshopping_mod/

  • POST http://localhost:9200/goshopping_oua/

  • POST http://localhost:9200/goshopping_opp/

For each request, Elasticsearch responds as follows if the index has been created successfully.

{
  "acknowledged": true
}
VenRaaS AAA (authentication, authorization, and accounting) sync

venraas index creation:

POST 
http://localhost:9200/venraas

The AAA info is available under

  • http://localhost:9200/venraas/action stores an action content per record.
  • http://localhost:9200/venraas/package stores all package information per record.
  • http://localhost:9200/venraas/company stores all company information per record.

sync info of AAA:

POST
http://localhost:9200/venraas/action
{  
  "updateTime":"2015-10-08 17:26:01",
  "account":"eric@goshopping.com",
  "accountId":1,
  "actionType":"新增公司",
  "actionContent":{  
    "companyId":19,
    "companyName":"goshopping",
    "codename":"goshopping",
    "domainName":"www.goshopping.com",
    "companyEnabled":true,
    "offlinemodelstatus":true,
    "token":"5Lc57e8WpD3",
    "userAdminId":163,
    "venraasptstatus":true,
    "packageId":68
  }
}
POST
http://localhost:9200/venraas/company
{  
  "updateTime":"2015-10-08 17:26:01",
  "companies":[  
    {  
      "companyId":1,
      "domainName":"www.goshopping.com",
      "companyName":"goshopping",
      "companyEnabled":true,
      "codename":"goshopping",
      "token":"5Lc57e8WpD3",
      "offlinemodelstatus":true,
      "venraasptstatus":true,
      "package":{  
        "packageId":69,
        "packageName":"B套餐",
        "packageEnabled":true,
        "apiRequestMax":1000,
        "userAccountMax":2,
        "fee":0,
        "extraImpressionFee":0
      }
    }
  ]
}

Counting API requests

We stores daily recom'd API request counting information under {cust}_bill/api_count.

API for counting of recom'd requests
POST
http://localhost:9200/gohappy_bill/api_count
{
  "page_type": "gop",
  "fea_id": "b01",
  "count": 1,
  "update_time": "2015-09-02 13:15:10"
}
API to query counting information
  • Query the histogram of hourly api request count
POST 
http://localhost:9200/gohappy_bill/api_count/_search
{
  "size": 0,
  "query": {
    "query_string": {
      "query": "*"
    }
  },
  "aggs": {
    "name_histo": {
      "date_histogram": {
        "field": "update_time",
        "interval": "1h"
      },
      "aggs": {
        "name_cnt": {
          "sum": {
            "field": "count"
          }
        }
      }
    }
  }
}
  • Query the histogram of hourly api request count with the specified page type, i.e. Category
POST
http://localhost:9200/gohappy_bill/api_count/_search
{
  "size": 0,
  "query": {
    "query_string": {
      "query": "page_type:cap"
    }
  },
  "aggs": {
    "name_histo": {
      "date_histogram": {
        "field": "update_time",
        "interval": "1h"
      },
      "aggs": {
        "name_cnt": {
          "sum": {
            "field": "count"
          }
        }
      }
    }
  }
}
  • Query the histogram grouped by $index and $page_type for daily api request count within a specific duration, i.e. Category
POST
http://localhost:9200/*_bill/api_count/
{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "_type:api_count AND update_time:[2015-10-01 TO 2015-10-6]",
          "analyze_wildcard": true
        }
      }
    }
  },
  "aggs": {
    "group_by_index": {
      "terms": {
        "field": "_index"
      },
      "aggs": {
        "group_by_pageType": {
          "terms": {
            "field": "page_type"
          },
          "aggs": {
            "day_sum": {
              "date_histogram": {
                "field": "update_time",
                "interval": "1d"
              },
              "aggs": {
                "sum_count": {
                  "sum": {
                    "field": "count"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
  • Query the overall histogram of hourly api request count
POST http://localhost:9200/*/api_count/_search

Logstash

Installation

  1. cd logstash to switch working dir
  2. tar -xvzf logstash-${version}.tar.gz to unpack package
  3. rm ~/.since* to reset sync cursor
  4. (optional, package updated)
    ./logstash-1.5.4/bin/plugin update logstash-input-lumberjack update lumberjack plugin.(Ref: elk/wiki/Logstash-Configuration-for-Lumberjack-Input)
  5. ./logstash-1.5.4/bin/logstash -f conf/weblog.conf to instance a logstash for weblog
  • nohup ./logstash-1.5.4/bin/logstash -f conf/weblog.conf > /dev/null 2>&1 & run in backgroud and nohup

Reference