Potential data loss with lua triggers
sarahmonod opened this issue · 4 comments
Describe the bug
When closing the connection in the middle of consuming items from a lua trigger, we would expect for the item to be acknowledged, and a subsequent re-opening of the connection to be able to resume to the next item in the queue, but instead it seems that the next items on the queue are not re-delivered.
To Reproduce
Steps to reproduce the behavior:
- Compile cdb2api and install it
- Create a mattdb local instance and start it
- Add the config to connect to it (e.g.
echo "$COMDB2_DBNAME 1234 $(hostname -f)" > /opt/bb/etc/cdb2/config/comdb2db.cfg
) - Compile https://gist.github.com/gusmonod/0c43bc81e37a024d7139c8175c07daa5
- Run it
- See that it is hanging when trying to consume event of id 3
- Insert into the
monitored
table (e.g./opt/bb/bin/cdb2sql mattdb dev "insert into monitored(id, message) values (5, 'five')"
) - Notice that the event consumed is not of id 5 and not of id 3
Expected behavior
We should be able to consume the event showing 3, NULL, bye
and not hang.
Screenshots
Environment (please complete the following information):
- Operating System and Version: Ubuntu Linux 20.04 running on Github Actions (so in azure)
- Comdb2 version: https://github.com/gusmonod/comdb2/tree/hostname_length
Additional context
To see how we are creating the environment, compiling comdb2, creating and starting the local instance see: https://github.com/bloomberg/python-comdb2/blob/bf9f758ef7eb291fe5c0072fa0fb0e39bc5e421c/.github/workflows/build.yml
You're right looks like there's no way to bypass the auto-consume logic in cdb2_close()
.
@gusmonod Context about the auto-consume logic: This is to paper over a common misuse of the Comdb2 API when calling cdb2_run_statement
, but not calling cdb2_next_record
to finish reading all responses until server returns CDB2_OK_DONE
. There are a few of these in your sample reproducer as well ;)
Idea is to reuse the connection if possible (even between different processes on a machine through sockpool
), as new connections are expensive; but connections cannot be reused if there is data waiting to be read.
Our python wrapper (python-comdb2) will make sure to consume everything to CDB2_OK_DONE
before a new cdb2_run_statement
call is made, however it will NOT do it if the connection is closed and the current statement had more things to consume. I can change the reproducer code so that it consumes everything, thus being closer to the case in our python wrapper.
No need - I was just pointing out the misuse is widespread (and easy to do) which is why the kludge was added.