basho/bitcask

0-length file can indefintitely hold up puts.

Closed this issue · 0 comments

This issue affects 2.0 and the 1.4 branch post 1.4.4.

The fix to avoid file server usage (#118) changed the way that the largest file ID is accounted for. Instead of listing the entire directory (potentially very expensive, directly in the put path), it keeps track of the file id in the keydir. Unfortunately, there is a corner case.

On the initial population of the keydir, each file is folded over in turn. If it has any keys at all, the keydir's largest id counter will be increased. However, if it has no keys, it will not be, because no keydir_put will happen. This can cause a problem when a keyless hintfile/datafile pair has the largest file id in the directory on startup, e.g.:

-rw-rw-r--  1 riak riak 1453263 Feb  3 14:11 186.bitcask.data
-rw-rw-r--  1 riak riak   60798 Feb  3 14:11 186.bitcask.hint
-rw-rw-r--  1 riak riak 1613200 Feb  3 14:10 187.bitcask.data
-rw-rw-r--  1 riak riak  124681 Feb  3 14:10 187.bitcask.hint
-rw-rw-r--  1 riak riak 1469163 Feb  3 22:24 188.bitcask.data
-rw-rw-r--  1 riak riak   59843 Feb  3 22:54 188.bitcask.hint
-rw-------  1 riak riak       0 Jan 30 19:57 189.bitcask.data
-rw-------  1 riak riak      18 Jan 30 19:57 189.bitcask.hint

Since in this case the biggest_file_id will be 188, the code here:
https://github.com/basho/bitcask/blob/develop/src/bitcask_fileops.erl#L73-L88
will fail, causing the vnode to crash and be restarted, leaving us exactly where we started.

A workaround is to manually remove the empty hint and data file pair, a fix should be forthcoming shortly.