sysown/proxysql

crashes during cluster shutdown in CI testing

mirostauder opened this issue · 1 comments

occasional crashes of cluster nodes during shutdown
experienced in CI testing on k8s-testing jenkins job 808

job is archived with all logs and crashdumps
crashdump backtraces below.

[2024-05-08 15:08:44] >>> WARN - Core file found 'test/cluster/node07/core.1397908' ...
[2024-05-08 15:08:44] >>> ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '../../src/proxysql -D /var/lib/jenkins/workspace/ProxySQL-Automated-Build-Testi', real uid: 112, effective uid: 112, real gid: 120, effective gid: 120, execfn: '../../src/proxysql', platform: 'x86_64'
The program is not being run.
[2024-05-08 15:08:44] >>> Reading symbols from ./proxysql...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `../../src/proxysql -D /var/lib/jenkins/workspace/ProxySQL-Automated-Build-Testi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000056c8ff803c78 in ProxySQL_HTTP_Server::~ProxySQL_HTTP_Server (
    this=0x5551494e55202c20, __in_chrg=<optimized out>)
    at ProxySQL_HTTP_Server.cpp:876
876		if (variables.proxysql_latest_version) {
(gdb) (gdb) #0  0x000056c8ff803c78 in ProxySQL_HTTP_Server::~ProxySQL_HTTP_Server (
    this=0x5551494e55202c20, __in_chrg=<optimized out>)
    at ProxySQL_HTTP_Server.cpp:876
#1  0x000056c8ff5a13f3 in ProxySQL_Admin::admin_shutdown (this=0x7d7cc1231800)
    at ProxySQL_Admin.cpp:6945
#2  0x000056c8ff5a1cae in ProxySQL_Admin::~ProxySQL_Admin (
    this=0x7d7cc1231800, __in_chrg=<optimized out>) at ProxySQL_Admin.cpp:7013
#3  0x000056c8ff291a9d in ProxySQL_Main_shutdown_all_modules () at main.cpp:975
#4  0x000056c8ff293b86 in ProxySQL_Main_init_phase4___shutdown ()
    at main.cpp:1243
#5  0x000056c8ff2a0556 in main (argc=5, argv=0x7ffec3dd4e28) at main.cpp:2526
(gdb) 
quit
[2024-05-08 15:08:47] >>> Compressing 'test/cluster/node07/core.1397908' ...
[2024-05-08 15:08:50] >>> WARN - Core file found 'test/cluster/node07/core.1355228' ...
[2024-05-08 15:08:50] >>> ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '../../src/proxysql -D /var/lib/jenkins/workspace/ProxySQL-Automated-Build-Testi', real uid: 112, effective uid: 112, real gid: 120, effective gid: 120, execfn: '../../src/proxysql', platform: 'x86_64'
413	../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S: No such file or directory.
The program is not being run.
[2024-05-08 15:08:50] >>> Reading symbols from ./proxysql...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `../../src/proxysql -D /var/lib/jenkins/workspace/ProxySQL-Automated-Build-Testi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __memcmp_avx2_movbe ()
    at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:413
(gdb) (gdb) #0  __memcmp_avx2_movbe ()
    at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:413
#1  0x00007d7cc154ea0c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x000056c8ff2b382f in std::operator< <char, std::char_traits<char>, std::allocator<char> > (
    __lhs=<error: Cannot access memory at address 0x6c75725f79726575>, 
    __rhs="mysql1:3306") at /usr/include/c++/11/bits/basic_string.h:6343
#3  0x000056c8ff2abc1d in std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::operator() (this=0x7d7cc05a0cd8, 
    __x=<error: Cannot access memory at address 0x6c75725f79726575>, 
    __y="mysql1:3306") at /usr/include/c++/11/bits/stl_function.h:400
#4  0x000056c8ff7187f0 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> > >::_M_lower_bound (
    this=0x7d7cc05a0cd8, __x=0x7d7cb76b25a0, __y=0x7d7cb76b23c0, 
    __k="mysql1:3306") at /usr/include/c++/11/bits/stl_tree.h:1905
#5  0x000056c8ff70c806 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> > >::find (this=0x7d7cc05a0cd8, 
    __k="mysql1:3306") at /usr/include/c++/11/bits/stl_tree.h:2523
#6  0x000056c8ff7010f7 in std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, MyGR_monitor_node*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, MyGR_monitor_node*> > >::find (this=0x7d7cc05a0cd8, 
    __x="mysql1:3306") at /usr/include/c++/11/bits/stl_map.h:1170
#7  0x000056c8ff6c214f in gr_update_hosts_map (start_time=86789911026, 
    gr_srv_st=..., mmsd=0x7d7cb646e500) at MySQL_Monitor.cpp:3745
#8  0x000056c8ff6c3b1e in async_gr_mon_actions_handler (mmsd=0x7d7cb646e500)
    at MySQL_Monitor.cpp:3970
#9  0x000056c8ff6c49a6 in monitor_GR_thread_HG (arg=0x7d7cb95e7048)
    at MySQL_Monitor.cpp:4096
#10 0x00007d7cc1094ac3 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#11 0x00007d7cc1126850 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 
quit
[2024-05-08 15:08:52] >>> Compressing 'test/cluster/node07/core.1355228' ...
[2024-05-08 15:08:56] >>> WARN - Core file found 'src/core.1354957' ...
[2024-05-08 15:08:56] >>> ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from './proxysql --clickhouse-server --sqlite3-server --idle-threads -f -c /var/lib/j', real uid: 112, effective uid: 112, real gid: 120, effective gid: 120, execfn: './proxysql', platform: 'x86_64'
44	./nptl/pthread_kill.c: No such file or directory.
The program is not being run.
[2024-05-08 15:08:56] >>> Reading symbols from ./proxysql...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./proxysql --clickhouse-server --sqlite3-server --idle-threads -f -c /var/lib/j'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=129351770711616)
    at ./nptl/pthread_kill.c:44
(gdb) (gdb) #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=129351770711616)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=129351770711616)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=129351770711616, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x000075a527042476 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x000075a5270287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000075a52702871b in __assert_fail_base (
    fmt=0x75a5271dd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x57c5dc3eb60c "prevflags != -1", 
    file=0x57c5dc3e9b12 "MySQL_Thread.cpp", line=2902, 
    function=<optimized out>) at ./assert/assert.c:92
#6  0x000075a527039e96 in __GI___assert_fail (
    assertion=0x57c5dc3eb60c "prevflags != -1", 
    file=0x57c5dc3e9b12 "MySQL_Thread.cpp", line=2902, 
    function=0x57c5dc3eb5c0 "MySQL_Session* MySQL_Thread::create_new_session_and_client_data_stream(int)") at ./assert/assert.c:101
#7  0x000057c5db8cc6c1 in MySQL_Thread::create_new_session_and_client_data_stream (this=0x75a523e0c000, _fd=7) at MySQL_Thread.cpp:2902
#8  0x000057c5db9e3397 in child_mysql (arg=0x75a5254783f0)
    at ProxySQL_Admin.cpp:5539
#9  0x000075a527094ac3 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#10 0x000075a527126850 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 
quit
[2024-05-08 15:08:58] >>> Compressing 'src/core.1354957' ...

There were mitigations introduced for the first two types of crashes in this PR. There are no mitigations yet for the assert.