Percona-Lab/pacemaker-replication-agents

mysql_stop does not really kills mysql in debian

Closed this issue · 17 comments

HI,
Found a case when mysql_stop does not behaves as it should.

The problem is in this part of code:
pid=cat ${OCF_RESKEY_pid}.starting 2> /dev/null
/bin/kill $pid > /dev/null
rc=$?
if [ $rc != 0 ]; then
ocf_log err "MySQL couldn't be stopped"
return $OCF_ERR_GENERIC
fi

Debug:

  • '[' '!' -f /var/run/mysqld/mysqld.pid.starting ']'
    ++ cat /var/run/mysqld/mysqld.pid.starting
  • pid=32752
  • /bin/kill 32752
  • rc=0
  • '[' 0 '!=' 0 ']'
  • '[' 0 -eq 1 ']'
  • shutdown_timeout=15
  • '[' -n 900000 ']'
  • shutdown_timeout=895
  • count=0
  • '[' 0 -lt 895 ']'
  • kill -s 0 32752
  • rc=0
  • '[' 0 -ne 0 ']'
    ++ expr 0 + 1
  • count=1
  • sleep 1
  • ocf_log debug 'MySQL still hasn'''t stopped yet. Waiting...'
  • '[' 2 -lt 2 ']'

It does kill /usr/bin/mysqld_safe process but its child process /usr/sbin/mysqld stays alive till timeout is over and mysql gets killed with -KILL. That makes the process of failover nonworkable.

OS - Debian 7
Mysql - Percona 5.6.22

tried to change
pid=cat ${OCF_RESKEY_pid}.starting 2> /dev/null
/bin/kill $pid > /dev/null
rc=$?
to
pid=cat ${OCF_RESKEY_pid} 2> /dev/null
/bin/kill $pid > /dev/null
rc=$?

and its helped.
tried only on debian

mysqld_safe should not be used with pacemaker but I realize the defaults are not correct, I'll modify. Do you have the same if you call mysqld directly?

you mean if start mysql through the init script ?
in my.cnf we have :
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
malloc-lib = /usr/lib/x86_64-linux-gnu/libjemalloc.so.1

root 8713 0.0 0.0 4180 736 ? S 12:45 0:00 /bin/sh /usr/bin/mysqld_safe --defaults-file=/etc/mysql/my.cnf --enforce_gtid_consistency=1 --gtid_mode=on --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --datadir=/mdata01/mysql/db --user=mysql --skip-slave-start --read-only
mysql 10050 0.3 64.0 4226856 2602048 ? Sl 12:45 0:15 /usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf --basedir=/usr --datadir=/mdata01/mysql/db --plugin-dir=/usr/lib/mysql/plugin --user=mysql --enforce-gtid-consistency=1 --gtid-mode=on --skip-slave-start --read-only --log-error=/var/log/mysql/mysql-error.log --open-files-limit=65535 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306

MySQL is supposed to be started by the agent, not by the init.d script. The init.d script should be disabled with PRM. The "binary" parameter of the agent should point to mysqld. See here for more details:

https://github.com/percona/percona-pacemaker-agents/blob/master/doc/PRM-setup-guide.rst#the-mysql-resource-primitive

Yes , and its been started via agent and runs this way .

Just to confirm, in you pacemaker primitive, you are using: binary="/usr/sbin/mysqld" and mysql was stopped before you put the node online in pacemaker. If you did that, there's no way mysqld_safe could have been running. The issue you have is that the agent is recording the pid of the mysqld_safe script instead of mysqld. Can you show your Pacemaker primitive for mysqld?

Yes , now i see /usr/bin/mysqld_safe:
configure primitive p_db1_mysql ocf:percona:mysql
params config="/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid"
socket="/var/run/mysqld/mysqld.sock" replication_user="someuser"
replication_passwd="somepassword" max_slave_lag="60"
evict_outdated_slaves="false" binary="/usr/bin/mysqld_safe"
test_user="clusteruser" test_passwd="password"
reader_attribute="db1_readable"
op start interval="0" timeout="900s"
op stop interval="0" timeout="900s"
op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1"
op monitor interval="2s" role="Slave" OCF_CHECK_LEVEL="1"

Can you name some reasons why its better not to use mysqld_safe with pacemaker ?

Pacemaker must know if mysqld crashes, a lot of the agent logic is built around that. With mysqld_safe, a crash of mysqld is masked and mysqld_safe restarts mysqld (instead of Pacemaker). You simply won't get the right behavior with mysqld_safe. You also noticed another issue... I'll add support for the malloc-lib in the agent, that's straightforward.

ok, it worked now withot mysqld_safe.

when you plan to add the support for malloc-lib

I hope to have time today.

I have the version with the parameter in the 1.0-beta branch but it is still failing a test. I'll continue debugging tomorrow.

Any results?

Hi,
I'll only resume work tomorrow on it.

Regards,

Yves

Le Tue, 17 Mar 2015 07:52:04 -0700,
Vladimir Zulin-Tarelkin notifications@github.com a écrit :

Any results?


Reply to this email directly or view it on GitHub:
#50 (comment)

Any news?
We are considering of taking the 5.6 in use. But without jemalloc its not possible.

Hi,
sorry for the delay, I had some pertubations in personal life. For
now, a quick fix would be to add the LD_PRELOAD command to the script
directly in the mysql_start_low function like:

LD_PRELOAD=/usr/lib/libjemalloc.so ${OCF_RESKEY_binary} --defaults-file=$OCF_RESKEY_config
--pid-file=$OCF_RESKEY_pid \ --socket=$OCF_RESKEY_socket
--datadir=$OCF_RESKEY_datadir
--user=$OCF_RESKEY_user $OCF_RESKEY_additional_parameters
$mysql_extra_params >/dev/null 2>&1 &

Regards,

Yves

Le Thu, 26 Mar 2015 07:26:07 -0700,
Vladimir Zulin-Tarelkin notifications@github.com a écrit :

Any news?
We are considering of taking the 5.6 in use. But without jemalloc its
not possible.


Reply to this email directly or view it on GitHub:
#50 (comment)

Fixed in 1.0.0