/LoggingInfrastructure

This repository contains the description of an infrastructure used to store, process and visualize logs created by applications

Primary LanguagePHPGNU General Public License v3.0GPL-3.0

Logging Infrastructure

Do you need logging services in your web apps?

Do you need a scalable and low latency infrastructure to help you in this job?

Do you need to process your logs before they are sent to their destination and at the same time retain low latency logging?

Do you need to visualize the logs using an easy to read web interface and query them for useful information?

If the answer to all of these questions is yes then you are moving in the right direction, read along.

What is this?

In this project we will help you to create a scalable, low latency, cost effective and reliable logging infrastructure using different software including rabbitmq,logstash,confluent,elasticsearch and kibana.

Each one of these software plays an important role in our infrastructure, very soon you will learn about all of them.

Ansible will be used to deploy this infrastructure, we will use vagrant for demonstration purposes here.

Components

We will talk about each component by itself and describe the reason we chose to use it here.

  • RabbitMQ is the most widely deployed open source message broker in our setup it is used for achieve low latency logging without it the latency caused when sending log messages will be much higher, it can be replaced by some other components such as redis, we chose RabbitMQ because of its simplicity and we have ready to use log shipper from our symfony application to it then to logstash directly.

  • Logstash is used for transfering logs from RabbitMQ to apache kafka, this could be removed in the future according to needs in production environment.

  • Apache Kafka is the central component in our infrastructure, it is used to process the logs sent from logstash then store them in elasticsearch, this processing may include replacing numeric user IDs with real user names from the database, this processing is done here not on the application server to minimize the latency caused by logging.

  • Elasticsearch this component is where all of our data is stored, with elasticsearch we can query this data to extract useful information from it.

  • Kibana this last component is used to visualize the data from elasticsearch.

Data flow

1- You need to configure your web application to send logs to local rabbitmq servers installed on the same hosts as the web application with this we achieve the lowest logging latency by avoiding any latency from the network, data is sent on the same host to different locations on the server Random Access Memory.

2- After that data is sent from rabbitmq to logstash servers and then to kafka server for processing.

3- When data arrives at kafka it is processed as required then sent to be indexed in elasticsearch.

4- At elasticsearch data is indexed and stored then you can visualize it using kibana and create custom graphs and dashboards as per your needs.

I will continuously add to this repository until it reaches a stable state and release it for production use :)