bloomberg/comdb2

(hasql): Support testing resumed snapshot consistency

morgando opened this issue · 0 comments

Context

High availability SQL (hasql) allows Comdb2 to mask hardware failures by seamlessly resuming SQL execution on a different node when there's a failure on the original node.

Problem

We should have tests that verify that the resumed query is running at the same snapshot point-in-time that it was running at before failure (manual testing reveals that this is buggy). In order to write tests like this it would help to have a tool that supports running requests from multiple clients in a particular order and also supports disconnecting from the database at a particular point during the test.

Solution

We already have a tool that supports running requests from multiple clients in a particular order (see stepper.c), and we also have a tool that can disconnect from the database at a particular point during the test (see hatest.c). So stepper.c could possibly be extended so that it can also respond to a "BOUNCE_CONNECTION" directive by disconnecting from the database in the same way that hatest.c does it.

After these changes, stepper.c would be able to support a test like this:

1 set hasql on
1 set transaction snapshot isolation
1 create table t(i int primary key)
1 insert into t values(1)
1 begin
2 insert into t values(2)
1 select * from t
BOUNCE_CONNECTION
1 select * from t