/ChistaDATA-Anansi-Profiler

ChistaDATA Profiler for ClickHouse

Primary LanguageGoGNU General Public License v3.0GPL-3.0

ChistaDATA Anansi Profiler

Introduction

ChistaDATA Anansi is a log analysing tool built for ClickHouse. It is a CLI tool written in pure GO. The reports generated by the profiler sheds light on the different aspects of queries like execution time, memory usage, bytes read and other basic but crucial information.

The "anansi" takes the log output from a running ClickHouse/PostgreSQL database instance and processes it into a txt file. The report it provides presents all the information found in a pleasant and easy to understand reporting style.

To allow the Anansi profiler to parse the Postgres log file, you will need to provide the following parameters:

  1. Just specify for the which database. It should be specified using the flag "--database-type=postgres".

  2. If you are using a custom prefix in your Postgres log file, you will need to specify it using the flag "--log-prefix=". This will enable the Anansi profiler to correctly identify and parse the relevant information in your log file.

  3. This tool captures all of the DML/DDL queries by default. If you discard the any query type, just specify it into like "--discard-queries=insert"

  4. Just set the these parameters on PostgreSQL side: log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '; log_min_duration_statement = 5000 #if we need that queries take longer than 5 second log_checkpoints = on log_connections = on log_disconnections = on log_lock_waits = on log_temp_files = 0 log_autovacuum_min_duration = 0 log_error_verbosity = default log_statement = off lc_messages='C'

Features

  • Multiple log file support
  • Report customization using CLI arguments
  • Multiple report formats are available (text, markdown)
  • Can be compiled to binaries compatible with multiple OS and architectures
  • Capable of processing huge files
  • Supports compressed files
  • Capable of fetching files from S3
  • These reports can help shed light on top n queries for now. Errors occurring in the system, checkpoint behaviour, vacuum behaviour, trends and other basic but critical information for a these database systems are also available in the future.

Installation

Prerequisites

  • Go
  • git
  • make

Clone the repo

git clone https://github.com/ChistaDATA/ChistaDATA-Anansi-Profiler.git

Build

make build

Your executable anansi-profiler will be generated in the working directory.

Example

Run with a sample file

./anansi-profiler -n 2 ./server.log

Output

# Current date: 2022-10-06 13:06:39.088724 +0530 IST m=+19.943042793
# Hostname: local
# Files:
 * server.log
# Overall: 243122, Unique: 17 , QPS: 9.10
# Time range: 2022-10-03 18:42:08.317458 +0000 UTC to 2022-10-04 02:07:25.907276 +0000 UTC
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time        13676.0   0.00s   0.93s   0.06s   0.72s   0.18s   0.00s
# Rows read         53.44B    0.00   0.53M   0.22M   0.52M   0.16M   0.26M
# Bytes read        1.43TB   0.00B 26.02MB  6.16MB 24.91MB  7.74MB  2.27MB
# Peak Memory                0.00B  1.67GB  0.12GB  1.55GB  0.39GB  4.00MB

# Profile
# Rank Response time   Calls R/Call Query
# ==== =============== ===== ====== =====
#    1 12340.9  90.24% 16995  0.73s select distinct(city) from salary s1 LEFT JOIN salary s2 ON s1.event_id = s2.eve
#    2  43.53s   0.32% 17070  0.00s select min(amount) from salary where rand > 10000
#    3   0.49s   0.13%     1  0.49s insert into salary select * from salary
#    4   0.01s   0.00%     1  0.01s select * from land_registry_price_paid_uk limit 5;

# Query 1 : 1.243 QPS
# Time range: From 2022-10-03 18:42:08.4814 +0000 UTC to 2022-10-04 02:05:44.907986 +0000 UTC
# ====================================================================
# Attribute      total     min     max     avg     95%  stddev  median
# ============ ======= ======= ======= ======= ======= ======= =======
# Count         17.00K 
# Exec time    12340.9   0.00s   0.93s   0.73s   0.75s   0.04s   0.73s
# Rows read      8.77B    0.00   0.53M   0.52M   0.53M  21.50K   0.52M
# Bytes read    0.35TB   0.00B 22.03MB 21.33MB 21.97MB  0.89MB 21.37MB
# Peak Memory            0.00B  1.67GB  1.54GB  1.55GB 64.00MB  1.55GB
# ====================================================================
# Databases    default (16995/16995)  
# Hosts        [::1] (16995/16995)  
# Users        default (16995/16995)  
# Completion   16988/16995
# Query_time distribution
# ====================================================================
#   1us  
#  10us  
# 100us  
#   1ms  
#  10ms  
# 100ms  ###########################################################
#    1s  
#  10s+  
# ====================================================================
# Query
select distinct(city) from salary s1 LEFT JOIN salary s2 ON s1.event_id = s2.event_id 

# Query 2 : 1.248 QPS
# Time range: From 2022-10-03 18:42:11.876586 +0000 UTC to 2022-10-04 02:07:25.693396 +0000 UTC
# ====================================================================
# Attribute      total     min     max     avg     95%  stddev  median
# ============ ======= ======= ======= ======= ======= ======= =======
# Count         17.07K 
# Exec time     43.53s   0.00s   0.13s   0.00s   0.00s   0.00s   0.00s
# Rows read      4.41B    0.00   0.27M   0.26M   0.27M  10.29K   0.26M
# Bytes read   65.66GB   0.00B  4.07MB  3.94MB  4.05MB  0.16MB  3.95MB
# Peak Memory           0.53MB  4.20MB  2.88MB  2.89MB 90.90KB  2.88MB
# ====================================================================
# Databases    default (17070/17070)  
# Hosts        [::1] (17070/17070)  
# Users        default (17070/17070)  
# Completion   17066/17070
# Query_time distribution
# ====================================================================
#   1us  
#  10us  
# 100us  
#   1ms  ###########################################################
#  10ms  
# 100ms  
#    1s  
#  10s+  
# ====================================================================
# Query
select min(amount) from salary where rand > 10000

Usage

anansi-profiler <file-paths> ...

Arguments:

<file-paths> ...    Paths of log files

Flags:

Flags:
  -h, --help                                 Show context-sensitive help.
  -n, --top-query-count=10                   Count of queries for top x table
  -r, --report-type="text"                   Report type to be generated, types: md, text
  -c, --minimum-query-call-count=1           Minimum no of query calls needed
      --database-type="clickhouse"           Which database? Possible values: clickhouse, postgres
      --database-version="0"                 Database version
      --sort-field="ExecTime"                Sort queries by the given field, possible values: ExecTime, RowsRead, BytesRead, PeakMemory, QPS, QueryCount
      --sort-field-operation="max"           Sort queries by the given operation on field, possible values: sum, min, max, avg, per95, stdDev, median
      --sort-order="desc"                    Sort order, possible values: asc, desc
      --log-level="error"                    Log level, possible values: panic, fatal, error, warn, info, debug, trace
      --log-prefix=""                        Prefix of log
      --discard-queries=,...                 It will consider all the query types by default but type of queries can be discarded, possible values: select, update, delete, insert
      --s3-access-key-id=STRING
      --s3-secret-access-key=STRING
      --s3-session-token=STRING
      --s3-region=STRING
      --s3-object-urls=S3-OBJECT-URLS,...