Log collector for logagg
Collects all the logs from the server and parses it for making a common schema for all the logs and sends to NSQ.
- We expect users to follow Best practices for logging their application.
- Most importantly, do structured logging. Since, parsing/formatting logs is way easier that way.
- Set-up logagg-fs beforehand.
files
: Log files which are being tracked by logaggnode
: The server(s) where the logfiles
resideformatters
: The parser function that thecollector
uses to format the log lines to put it the common format.nsq
: The central location where logs are sent bycollector
(s) after formatting as messages.
- Guaranteed delivery of each log line from files to
nsq
- Reduced latency between a log being generated an being present in the
nsq
- Options to add custom
formatters
- File poll if log file not yet generated
- Works on rotational log files
- Custom
formatters
to support parsing of any log file. - Output format of processed log lines (dictionary)
id
(str) - A unique id per log with time ordering. Useful to avoid storing duplicates.timestamp
(str) - ISO Format time. eg:data
(dict) - Parsed log dataraw
(str) - Raw log line read from the log filehost
(str) - Hostname of the node where this log was generatedformatter
(str) - name of the formatter that processed the raw log linefile
(str) - Full path of the file in the host where this log line came fromtype
(str) - One of "log", "metric" (Is there one more type?)level
(str) - Log level of the log line.event
(str) - LOG eventerror
(bool) - True if collection handler failed during processingerror_tb
(str) - Error traceback
Prerequisites: Python3.5
Install the nsq
package, at where we need to bring up the nsq
server.
- Run the following commands to install
nsq
:$ sudo apt-get install libsnappy-dev $ wget https://s3.amazonaws.com/bitly-downloads/nsq/nsq-1.0.0-compat.linux-amd64.go1.8.tar.gz $ tar zxvf nsq-1.0.0-compat.linux-amd64.go1.8.tar.gz $ sudo cp nsq-1.0.0-compat.linux-amd64.go1.8/bin/* /usr/local/bin
Install the Docker package, at both collector
nodes.
- Run the following commands to install :
$ sudo apt-get update $ sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ software-properties-common $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - $ sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable" $ sudo apt-get update $ sudo apt-get install docker-ce
- Check Docker version >= 17.12.1
$ sudo docker -v Docker version 18.03.1-ce, build 9ee9f40
- Run serverstats to store server metrics in /var/log/serverstats
$ docker plugin install deepcompute/docker-file-log-driver:1.0 --alias file-log-driver $ docker run --hostname $HOSTNAME --name serverstats --restart unless-stopped --label formatter=logagg_collector.formatters.basescript --log-driver file-log-driver --log-opt labels=formatter --log-opt fpath=/serverstats/serverstats.log --detach deepcompute/serverstats:latest
Install the logagg-collector
package, at where we collect the logs and at where we forward the logs:
- Run the following command to pip install
logagg
:$ pip3 install https://github.com/deep-compute/pygtail/tarball/master/#egg=pygtail-0.6.1 $ pip3 install git+https://github.com/supriyopaul/logagg_utils.git $ pip3 install git+https://github.com/supriyopaul/logagg_collector.git
- NOTE: Run each command in a seperate Terminal window
- nsqlookupd
$ nsqlookupd
- nsqd -lookupd-tcp-address :4160
$ nsqd -lookupd-tcp-address localhost:4160
- nsqadmin -lookupd-http-address :4161
$ nsqadmin -lookupd-http-address localhost:4161
$ logagg-collector runserver --no-master --data-dir /tmp/ --logaggfs-dir /logcache/
- To see files collector is supposed to collect
curl -i 'http://localhost:1099/collector/v1/get_files'
- To set NSQ details
curl -i 'http://localhost:1099/collector/v1/set_nsq?nsqd_http_address=<nsq-server-ip-or-DNS>:4151&topic_name=logagg'
- To start collecting
curl -i 'http://localhost:1099/collector/v1/collect
Note: Replace with the ip of nsq
server eg.: 192.168.0.211
Note: --volume argument is to mount local directory of log file into Docker
container
Note: --hostname argument is to use the same hostname and not the docker container hostname
- You can check message traffic at
nsq
by going through the link: http://:4171/ for localhost see here - You can see the collected logs in realtime using the following command:
$ nsq_tail --topic=logagg --channel=test --lookupd-http-address=<nsq-server-ip-or-DNS>:4161
Formatter-name | Comments |
---|---|
logagg_collector.formatters.nginx_access | See Configuration here |
logagg_collector.formatters.mongodb | |
logagg_collector.formatters.basescript | |
logagg_collector.formatters.docker_log_file_driver | See example here |
$ echo $PYTHONPATH
$ mkdir customformatters
$ #Now append the path to $PYTHONPATH
$ export PYTHONPATH=$PYTHONPATH:/home/path/to/customformatters/
$ echo $PYTHONPATH
:/home/path/to/customformatters
$ cd customformatters/
$ mkdir myformatters
$ cd myformatters/
$ touch formatters.py
$ touch __init__.py
$ echo 'import formatters' >> __init__.py
$ #Now write your formatter functions inside the formatters.py file
Important:
- Only python standard modules can be imported in formatters.py file
- A formatter function should return a dict()
datatype
- The 'dict()' should only contain keys which are mentioned in the above log structure.
- Sample formatter functions:
import json import re sample_log_line = '2018-02-07T06:37:00.297610Z [Some_event] [Info] [Hello_there]' def sample_formatter(log_line): log = re.sub('[\[+\]]', '',log_line).split(' ') timestamp = log[0] event = log[1] level = log[2] data = dict({'message': log[3]}) return dict(timestamp = timestamp, event = event, level = level, data = data, )
To see more examples, look here
- Check if the custom handler works in
python interpreter
like for logagg.>>> import myformatters >>> sample_log_line = '2018-02-07T06:37:00.297610Z [Some_event] [Info] [Hello_there]' >>> output = myformatters.formatters.sample_formatter(sample_log_line) >>> from pprint import pprint >>> pprint(output) {'data': {'message': 'Hello_there'}, 'event': 'Some_event', 'level': 'Info', 'timestamp': '2018-02-07T06:37:00.297610Z'}
- Pseudo logagg collect commands:
$ sudo logagg collect --file file=logfile.log:myformatters.formatters.sample_formatter --nsqtopic logagg --nsqd-http-address localhost:4151
You can store logagg collector/forwarder logs into files using basescript --log-file argument or docker file log driver
$ sudo logagg --log-file /var/log/logagg/collector.log collect file=/var/log/serverstats/serverstats.log:formatter=logagg.formatters.basescript --nsqtopic logagg --nsqd-http-address <nsq-server-ip-or-DNS>:4151
If there are multiple files being tracked by multiple collectors on multiple nodes, the collector information can be seen in "Heartbeat" topic of NSQ. Every running collector sends a hearbeat to this topic (default interval = 30 seconds). The heartbeat format is as follows:
timestamp
: Timestamp of the recieved heartbeat.heartbeat_number
: The heartbeat number since the collector started running.host
: Hostname of the node on which the collector is running.nsq_topic
: The nsq topic which the collector is using.files_tracked
: list of files that are being tracked by the collector followed by the fomatter.
You can run the following command to see the information:
$ nsq_tail --topic=Heartbeat --channel=test --lookupd-http-address=<nsq-server-ip-or-DNS>:4161
You're more than welcome to hack on this:-)
$ git clone https://github.com/deep-compute/logagg
$ cd logagg
$ sudo python setup.py install
$ docker build -t logagg .