Spurious bf aborts under ps protocol
sciascid opened this issue · 1 comments
Running galera suites with option --ps-protocol
may cause some tests to fail with unexpected deadlock errors (which cannot happen when executed with normal protocol).
These deadlock errors are caused by the fact that during COM_STMT_PREPARE
processing the target tables of the statement may be opened, and therefore the prepare stage becomes vulnerable to bf aborts triggered by concurrent DDLs.
Also, there is no wsrep sync wait before COM_STMT_PREPARE
commands are processed.
All tests that have statements executing concurrently with DDLs, and that rely on sync wait for those statements to not fail are potentially affected by this issue.
A deterministic test has been devised:
--source include/galera_cluster.inc
--source include/have_debug_sync.inc
if (`SELECT $PS_PROTOCOL = 0`)
{
--skip Test requires: ps-protocol enabled
}
CREATE TABLE t1 (f1 INTEGER PRIMARY KEY, f2 CHAR(6)) ENGINE=InnoDB;
--connection node_1
SET GLOBAL DEBUG = "+d,sync.wsrep_apply_cb";
--connection node_2
OPTIMIZE TABLE t1;
--connection node_1
SET DEBUG_SYNC = "now WAIT_FOR sync.wsrep_apply_cb_reached";
SET DEBUG_SYNC = "stmt_prepare_before_mdl_release SIGNAL signal.wsrep_apply_cb WAIT_FOR bf_abort";
UPDATE t1 SET f2 = 2 WHERE f1 = 1;
And requires a new debug sync point:
diff --git a/sql/sql_prepare.cc b/sql/sql_prepare.cc
index 569499bbc44..c5d4a9ccf3e 100644
--- a/sql/sql_prepare.cc
+++ b/sql/sql_prepare.cc
@@ -128,6 +128,7 @@ When one supplies long data for a placeholder:
#include <limits>
using std::max;
using std::min;
+#include "debug_sync.h"
/**
A result class used to send cursor rows using the binary protocol.
@@ -3498,6 +3499,7 @@ bool Prepared_statement::prepare(const char *packet, uint packet_len)
/* No need to commit statement transaction, it's not started. */
DBUG_ASSERT(thd->transaction.stmt.is_empty());
+ DEBUG_SYNC(thd, "stmt_prepare_before_mdl_release");
close_thread_tables(thd);
thd->mdl_context.rollback_to_savepoint(mdl_savepoint);
I see two potential solutions:
- Add sync wait before
COM_STMT_PREPARE
- Make DDLs wait until conflicting
COM_STMT_PREPARE
commands are executing
This issue no longer reproduces. Sync wait before COM_STMT_PREPARE was added.