armon/bloomd

Crash on filter recreate

kazjote opened this issue · 5 comments

Hi,

Bloomd works for us very stable in production environment when we do only adds and checks. We do hunderds queries per second and it never crashed.

However, when performing automated tests it crashes quite often. My guess is that it crashes when we recreate the filters between the tests.

I can easily crash bloomd with this script https://gist.github.com/4682164 by running it multiple times. One after one until it crashes... Like this

for i in `seq 100`; do ruby crash.rb; sleep 1; done

Thanks for reporting this! I can reproduce with your script, so I will work on a fix.

Without having looked at it further, I suspect it has to do with how "drop" is handled. In order to make drop non-blocking, the delete is deferred to a separate thread that runs only when it is safe (meaning, there are no pending operations against that filter).

I'm willing to bet that there is a race condition between the other deferred thread and the create command which causes this to happen.

If you build against the latest from master branch, bloomd should no longer crash, however it still has poorly defined behavior. There are 3 basic choices that are possible in the drop/create situation:

  1. Create blocks until drop is complete. Due to the MVCC architecture, there is a chance this could deadlock.
  2. Reject the create, probably with a "Delete in progress" error.
  3. Allow the create to happen, and try to cancel the delete. If the delete is already taking place, then it cannot be canceled, and the system is potentially inconsistent.

At this point, I'm thinking that some combination of 3 and 2. We will try to abort the delete, and if not possible, then we will return the "Delete in progress" error, and the client must choose to do a retry at a later point.

I probably won't have a chance to implement this for a few weeks, as there are a number of features planned for the new release of bloomd (sets will be non-blocking during a resize).

In the mean time, if you use the latest build, and use in-memory filters for the tests, it should work without any problems!

Thanks for fixing this. It was really fast 👍

Just a heads up that v0.6.0 just shipped! This bug is now resolved. If there is a pending delete, the create command will now return "Delete in progress". Thanks for reporting this!

Thanks for fixing 👍