devops-works/slowql

MariaDB Slow Query log timestamp causing errors?

Closed this issue · 3 comments

Very interesting project you have @eze-kiel :)

On AlmaLinux 8 with MariaDB 10.3.34 via MariaDB official YUM repo, I tried doing a quick test for digest and replayer tools using mysqlslap generated data but the tools gives errors trying to parse MariaDB slow query log's timestamp?

Enabling slow query log and running digest on empty slow query log is fine

SLOWLOG=$(mysqladmin var | grep slow_query_log_file | awk '{ print $4}')
echo $SLOWLOG
# set long query time
secs=1; mysql -e "SET GLOBAL long_query_time=$secs; SELECT @@GLOBAL.long_query_time;"
mysql -e "SET GLOBAL log_queries_not_using_indexes=1; SELECT @@GLOBAL.log_queries_not_using_indexes;"

# enable
mysql -e "SET GLOBAL slow_query_log=1; SELECT @@GLOBAL.slow_query_log;"

# digest run
digest -f $SLOWLOG -k mariadb -no-cache -top 5
INFO[0000] log file has 3 lines                         
INFO[0000] digest duration: 7.984µs                     
INFO[0000] parsed 0 queries                             
INFO[0000] found 0 different queries hashs              

=-= Server meta =-=

Binary              : /usr/sbin/mysqld
Port                : 3306
Socket              : /var/lib/mysql/mysql.sock
Version             : 10.3.34-MariaDB
Version short       : 10.3.34
Version description : MariaDB Server

Digest duration     : 7.984µs
Real duration       : 0s

Bytes handled       : 0

=-= Queries stats =-=

Sorted by: random, increasing
Showing top 3 queries

INFO[0000] saving results in cache file

Using mysqlslap to generate some queries where slow query log ended up with

/usr/sbin/mysqld, Version: 10.3.34-MariaDB (MariaDB Server). started with:
Tcp port: 3306  Unix socket: /var/lib/mysql/mysql.sock
Time                Id Command  Argument
# Time: 220504  0:49:26
# User@Host: root[root] @ localhost []
# Thread_id: 38  Schema:   QC_hit: No
# Query_time: 0.002688  Lock_time: 0.001031  Rows_sent: 1  Rows_examined: 1
# Rows_affected: 0  Bytes_sent: 85
SET timestamp=1651639766;
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP() - variable_value) AS server_start FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name='Uptime';
# Time: 220504  0:49:28
# User@Host: root[root] @ localhost []
# Thread_id: 39  Schema:   QC_hit: No
# Query_time: 0.024253  Lock_time: 0.000855  Rows_sent: 1  Rows_examined: 31
# Rows_affected: 0  Bytes_sent: 77
SET timestamp=1651639768;
SELECT CONCAT(ROUND(SUM(index_length)/(1024*1024), 2), ' MB') AS 'Total Index Size' FROM information_schema.TABLES WHERE table_schema LIKE 'mysql';
# User@Host: root[root] @ localhost []
# Thread_id: 40  Schema:   QC_hit: No
# Query_time: 0.000643  Lock_time: 0.000095  Rows_sent: 1  Rows_examined: 31
# Rows_affected: 0  Bytes_sent: 76
SET timestamp=1651639768;
SELECT CONCAT(ROUND(SUM(data_length)/(1024*1024), 2), ' MB') AS 'Total Data Size'
FROM information_schema.TABLES WHERE table_schema LIKE 'mysql';
# Time: 220504  0:51:12
# User@Host: root[root] @ localhost []
# Thread_id: 41  Schema: mysqlslap  QC_hit: No
# Query_time: 0.000182  Lock_time: 0.000027  Rows_sent: 99  Rows_examined: 99
# Rows_affected: 0  Bytes_sent: 750
use mysqlslap;
SET timestamp=1651639872;
SELECT id from t1;
# Time: 220504  0:51:38
# User@Host: root[root] @ localhost []
# Thread_id: 46  Schema:   QC_hit: No
# Query_time: 0.000753  Lock_time: 0.000130  Rows_sent: 1  Rows_examined: 1
# Rows_affected: 0  Bytes_sent: 332
SET timestamp=1651639898;
SELECT CONCAT(table_schema,'.',table_name) AS 'Table Name', CONCAT(ROUND(table_rows,2),' Rows') AS 'Number of Rows',ENGINE AS 'Storage Engine',CONCAT(ROUND(data_length/(1024*1024),2),'MB') AS 'Data Size',CONCAT(ROUND(index_length/(1024*1024),2),'MB') AS 'Index Size' ,CONCAT(ROUND((data_length+index_length)/(1024*1024),2),'MB') AS'Total'FROM information_schema.TABLES WHERE table_schema LIKE 'mysqlslap';

Using digest tool with timestamp parse errors

digest -f $SLOWLOG -k mariadb -no-cache -top 5
INFO[0000] log file has 39 lines                        
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
INFO[0000] digest duration: 539.903µs                   
INFO[0000] parsed 5 queries                             
INFO[0000] found 5 different queries hashs              

=-= Server meta =-=

Binary              : /usr/sbin/mysqld
Port                : 3306
Socket              : /var/lib/mysql/mysql.sock
Version             : 10.3.34-MariaDB
Version short       : 10.3.34
Version description : MariaDB Server

Digest duration     : 539.903µs
Real duration       : 0s

Bytes handled       : 1320

=-= Queries stats =-=

Sorted by: random, increasing
Showing top 5 queries

Query #1
Calls                  : 1
Hash                   : a25f0c342e79b215c07e1d8d39698013
Fingerprint            : select concat(round(sum(index_length)/(1024*1024), 2), ' mb') as 'total index size' from information_schema.tables where table_schema not like ?;
Schema                 : 
Min/Max/Mean time      : 24.253ms/24.253ms/24.253ms
p50/p95                : 24.253ms/24.253ms
Concurrency            : +Inf%
Standard deviation     : 0s
Cum Query Time         : 24.253ms
Cum Lock Time          : 855µs
Cum Bytes sent         : 77
Cum Rows Examined/Sent : 31/1
Cum Killed             : 0

Query #2
Calls                  : 1
Hash                   : cf1e7ac3d6c8ea1fc5e850c202e14387
Fingerprint            : select concat(table_schema,'.',table_name) as 'table name', concat(round(table_rows,2),' rows') as 'number of rows',engine as 'storage engine',concat(round(data_length/(1024*1024),2),'mb') as 'data size',concat(round(index_length/(1024*1024),2),'mb') as 'index size' ,concat(round((data_length+index_length)/(1024*1024),2),'mb') as'total'from information_schema.tables where table_schema not like ?;
Schema                 : 
Min/Max/Mean time      : 753µs/753µs/753µs
p50/p95                : 753µs/753µs
Concurrency            : +Inf%
Standard deviation     : 0s
Cum Query Time         : 753µs
Cum Lock Time          : 130µs
Cum Bytes sent         : 332
Cum Rows Examined/Sent : 1/1
Cum Killed             : 0

Query #3
Calls                  : 1
Hash                   : d329bd02c28b7aed6f1f1bc0936dec9c
Fingerprint            : select from_unixtime(unix_timestamp() - variable_value) as server_start from information_schema.global_status where variable_name = ?;
Schema                 : 
Min/Max/Mean time      : 2.688ms/2.688ms/2.688ms
p50/p95                : 2.688ms/2.688ms
Concurrency            : +Inf%
Standard deviation     : 0s
Cum Query Time         : 2.688ms
Cum Lock Time          : 1.031ms
Cum Bytes sent         : 85
Cum Rows Examined/Sent : 1/1
Cum Killed             : 0

Query #4
Calls                  : 1
Hash                   : 887441f2e423743ac8c498fd627ec970
Fingerprint            : use mysqlslap;select id from t1;
Schema                 : mysqlslap
Min/Max/Mean time      : 182µs/182µs/182µs
p50/p95                : 182µs/182µs
Concurrency            : +Inf%
Standard deviation     : 0s
Cum Query Time         : 182µs
Cum Lock Time          : 27µs
Cum Bytes sent         : 750
Cum Rows Examined/Sent : 99/99
Cum Killed             : 0

Query #5
Calls                  : 1
Hash                   : c7454f79d5610e48654e9a98ca5b1f74
Fingerprint            : select concat(round(sum(data_length)/(1024*1024), 2), ' mb') as 'total data size'from information_schema.tables where table_schema not like ?;
Schema                 : 
Min/Max/Mean time      : 643µs/643µs/643µs
p50/p95                : 643µs/643µs
Concurrency            : +Inf%
Standard deviation     : 0s
Cum Query Time         : 643µs
Cum Lock Time          : 95µs
Cum Bytes sent         : 76
Cum Rows Examined/Sent : 31/1
Cum Killed             : 0

and replayer tool with same timestamp parsing errors

SLOWLOG=$(mysqladmin var | grep slow_query_log_file | awk '{ print $4}')
DBNAME=database_name
DBUSER=root
DBPASS=password
DBIP=dbserver_ip_address
DBPORT=3306
mysqladmin create $DBNAME

replayer -u $DBUSER -password $DBPASS -h ${DBIP}:${DBPORT} -k mariadb -f $SLOWLOG -db $DBNAME
INFO[0000] getting real execution time                  
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
INFO[0000] 100 workers will be created                  
WARN[0000] replaying with dry run                       
INFO[0000] replay started on Wed May 4 01:12:42         
INFO[0000] estimated time of end: Wed May 4 01:12:42    
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
ERRO[0000] time: error converting 220504 to time: parsing time "220504 " as "060102 15:04:05": cannot parse "" as "15" 
0 / 5 [....................................................................................................................................................................................................] ? p/s 0s 0.00%
INFO[0000] replay ended on Wed May 4 01:12:42           

=-= Results =-=

Replay duration:  685.959µs
Real duration:    0s
Log file:         /var/lib/mysql/slowq.log
Dry run:          true
Workers:          100

Database
  ├─ kind:      mariadb
  ├─ user:      root
  ├─ use pass:  false
  └─ address:   127.0.0.1:3306

Statistics
  ├─ Queries:                5
  ├─ Errors:                 0
  ├─ Queries success rate:   100.0000%
  ├─ Speed factor:           1.0000
  ├─ Duration difference:    n/a
  └─ Replayer speed:         -+Inf%

Note: the replayer may take a little more time due to the numerous conditions that are verified during the replay.

Hi @centminmod, thanks for raising the issue! I'll investigate this as soon as possible.

I think I found the issue. In your slow query log, in the headers the # Time: contains two spaces between the date and the time:

/usr/sbin/mysqld, Version: 10.3.34-MariaDB (MariaDB Server). started with:
Tcp port: 3306  Unix socket: /var/lib/mysql/mysql.sock
Time                Id Command  Argument
# Time: 220504  0:49:26
# User@Host: root[root] @ localhost []
# Thread_id: 38  Schema:   QC_hit: No
# Query_time: 0.002688  Lock_time: 0.001031  Rows_sent: 1  Rows_examined: 1
# Rows_affected: 0  Bytes_sent: 85

which leads to an error here in the code:

date := parts[idx+1] + " " + parts[idx+2]

I could easily fix this by replacing parts[idx+2] by parts[idx+3]. However, there is something strange. I started a Docker image with the same version of MariaDB as you (10.3.34), and the # Time: header does not have two spaces between the date and the time:

mysqld, Version: 10.3.34-MariaDB-1:10.3.34+maria~focal (mariadb.org binary distribution). started with:
Tcp port: 3306  Unix socket: /var/run/mysqld/mysqld.sock
Time		    Id Command	Argument
mysqld, Version: 10.3.34-MariaDB-1:10.3.34+maria~focal (mariadb.org binary distribution). started with:
Tcp port: 3306  Unix socket: /var/run/mysqld/mysqld.sock
Time		    Id Command	Argument
# Time: 220504 13:17:36
# User@Host: root[root] @  [172.17.0.1]
# Thread_id: 14  Schema:   QC_hit: No
# Query_time: 0.000052  Lock_time: 0.000000  Rows_sent: 0  Rows_examined: 0
# Rows_affected: 0  Bytes_sent: 11
SET timestamp=1651670256;
SET GLOBAL min_examined_row_limit=0;

But in my case there is an additional space character in the # User@Host header between "@" and "[172.17.0.1]", which is probably due to no hostname 🤔

Did you manually edit or use any parser than slowql on the slow query log file?

Interesting I don't think I used a parser and definitely didn't edit the slow query log file. But trying again doesn't give the error now and there's no additional space in the Time line

cat slowq.log
/usr/sbin/mysqld, Version: 10.3.34-MariaDB (MariaDB Server). started with:
Tcp port: 3306  Unix socket: /var/lib/mysql/mysql.sock
Time                Id Command  Argument
mysql -e "SELECT FROM_UNIXTIME(UNIX_TIMESTAMP() - variable_value) AS server_start FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name='Uptime';"
+---------------------+
| server_start        |
+---------------------+
| 2022-03-31 22:38:33 |
+---------------------+

no extra space in slow query log

# Time: 220504 12:46:40
# User@Host: root[root] @ localhost []
# Thread_id: 59  Schema:   QC_hit: No
# Query_time: 0.000563  Lock_time: 0.000137  Rows_sent: 1  Rows_examined: 1
# Rows_affected: 0  Bytes_sent: 85
SET timestamp=1651682800;

Though wouldn't using the timestamp epoch value be more uniform?

date -d @1651682800
Wed May  4 12:46:40 EDT 2022

full slow query log

cat slowq.log
/usr/sbin/mysqld, Version: 10.3.34-MariaDB (MariaDB Server). started with:
Tcp port: 3306  Unix socket: /var/lib/mysql/mysql.sock
Time                Id Command  Argument
# Time: 220504 12:46:40
# User@Host: root[root] @ localhost []
# Thread_id: 59  Schema:   QC_hit: No
# Query_time: 0.000563  Lock_time: 0.000137  Rows_sent: 1  Rows_examined: 1
# Rows_affected: 0  Bytes_sent: 85
SET timestamp=1651682800;
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP() - variable_value) AS server_start FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name='Uptime';
digest -f $SLOWLOG -k mariadb
INFO[0000] cache found: /var/lib/mysql/slowq.log.cache. Trying to restore it 
ERRO[0000] cannot restore cache: hashes does not match, log file must have changed since cache creation 
WARN[0000] continuing without cache                     
INFO[0000] cache will not be used                       
INFO[0000] log file has 10 lines                        
INFO[0000] digest duration: 144.643µs                   
INFO[0000] parsed 1 queries                             
INFO[0000] found 1 different queries hashs              

=-= Server meta =-=

Binary              : /usr/sbin/mysqld
Port                : 3306
Socket              : /var/lib/mysql/mysql.sock
Version             : 10.3.34-MariaDB
Version short       : 10.3.34
Version description : MariaDB Server

Digest duration     : 144.643µs
Real duration       : 0s

Bytes handled       : 85

=-= Queries stats =-=

Sorted by: random, increasing
Showing top 3 queries

Query #1
Calls                  : 1
Hash                   : d329bd02c28b7aed6f1f1bc0936dec9c
Fingerprint            : select from_unixtime(unix_timestamp() - variable_value) as server_start from information_schema.global_status where variable_name = ?;
Schema                 : 
Min/Max/Mean time      : 563µs/563µs/563µs
p50/p95                : 563µs/563µs
Concurrency            : +Inf%
Standard deviation     : 0s
Cum Query Time         : 563µs
Cum Lock Time          : 137µs
Cum Bytes sent         : 85
Cum Rows Examined/Sent : 1/1
Cum Killed             : 0

INFO[0000] saving results in cache file                 
ERRO[0000] cannot save results in cache file: json: unsupported value: +Inf 

Not sure if ERRO[0000] cannot save results in cache file: json: unsupported value: +Inf is related?

and replayer

replayer -u $DBUSER -password $DBPASS -h ${DBIP}:${DBPORT} -k mariadb -f $SLOWLOG -db $DBNAME
INFO[0000] getting real execution time                  
INFO[0000] 100 workers will be created                  
WARN[0000] replaying with dry run                       
INFO[0000] replay started on Wed May 4 12:49:09         
INFO[0000] estimated time of end: Wed May 4 12:49:09    
1 / 1 [▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉] ? p/s 0s 100.00%
INFO[0000] replay ended on Wed May 4 12:49:09           

=-= Results =-=

Replay duration:  1.03876ms
Real duration:    0s
Log file:         /var/lib/mysql/slowq.log
Dry run:          true
Workers:          100

Database
  ├─ kind:      mariadb
  ├─ user:      root
  ├─ use pass:  false
  └─ address:   127.0.0.1:3306

Statistics
  ├─ Queries:                1
  ├─ Errors:                 0
  ├─ Queries success rate:   100.0000%
  ├─ Speed factor:           1.0000
  ├─ Duration difference:    n/a
  └─ Replayer speed:         -+Inf%

Note: the replayer may take a little more time due to the numerous conditions that are verified during the replay.

So no idea where the extra space came from as grepping my command history doesn't show anything like that history | egrep 'slowq|SLOWLOG'