bloomberg/comdb2

Potential data loss with lua triggers

sarahmonod opened this issue · 4 comments

Describe the bug

When closing the connection in the middle of consuming items from a lua trigger, we would expect for the item to be acknowledged, and a subsequent re-opening of the connection to be able to resume to the next item in the queue, but instead it seems that the next items on the queue are not re-delivered.

To Reproduce
Steps to reproduce the behavior:

  1. Compile cdb2api and install it
  2. Create a mattdb local instance and start it
  3. Add the config to connect to it (e.g. echo "$COMDB2_DBNAME 1234 $(hostname -f)" > /opt/bb/etc/cdb2/config/comdb2db.cfg)
  4. Compile https://gist.github.com/gusmonod/0c43bc81e37a024d7139c8175c07daa5
  5. Run it
  6. See that it is hanging when trying to consume event of id 3
  7. Insert into the monitored table (e.g. /opt/bb/bin/cdb2sql mattdb dev "insert into monitored(id, message) values (5, 'five')")
  8. Notice that the event consumed is not of id 5 and not of id 3

Expected behavior

We should be able to consume the event showing 3, NULL, bye and not hang.

Screenshots

demo

Environment (please complete the following information):

Additional context

To see how we are creating the environment, compiling comdb2, creating and starting the local instance see: https://github.com/bloomberg/python-comdb2/blob/bf9f758ef7eb291fe5c0072fa0fb0e39bc5e421c/.github/workflows/build.yml

You're right looks like there's no way to bypass the auto-consume logic in cdb2_close().

@gusmonod Context about the auto-consume logic: This is to paper over a common misuse of the Comdb2 API when calling cdb2_run_statement, but not calling cdb2_next_record to finish reading all responses until server returns CDB2_OK_DONE. There are a few of these in your sample reproducer as well ;)
Idea is to reuse the connection if possible (even between different processes on a machine through sockpool), as new connections are expensive; but connections cannot be reused if there is data waiting to be read.

Our python wrapper (python-comdb2) will make sure to consume everything to CDB2_OK_DONE before a new cdb2_run_statement call is made, however it will NOT do it if the connection is closed and the current statement had more things to consume. I can change the reproducer code so that it consumes everything, thus being closer to the case in our python wrapper.

No need - I was just pointing out the misuse is widespread (and easy to do) which is why the kludge was added.