basho/bitcask

Bitcask cannot recover from invalid or empty lock files

Closed this issue · 9 comments

Repro steps: Start a single node with bitcask, fill with enough data to have a few data files, stop node, create empty lock file, start node.

Errors:

09:38:29.529 [error] Failed to read lock data from ./data/bitcask/0/bitcask.create.lock: {invalid_data,<<>>}
09:38:29.529 [error] gen_fsm <0.692.0> in state active terminated with reason: lock_failure in bitcask_fileops:get_create_lock/2 line 106
09:38:29.529 [error] CRASH REPORT Process <0.692.0> with 1 neighbours exited with reason: lock_failure in bitcask_fileops:get_create_lock/2 line 106 in gen_fsm:terminate/7 line 611
09:38:29.530 [error] Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.692.0> exit with reason lock_failure in bitcask_fileops:get_create_lock/2 line 106 in context child_terminated
09:38:29.530 [error] gen_fsm <0.800.0> in state ready terminated with reason: lock_failure in bitcask_fileops:get_create_lock/2 line 106
09:38:29.530 [error] CRASH REPORT Process <0.800.0> with 10 neighbours exited with reason: lock_failure in bitcask_fileops:get_create_lock/2 line 106 in gen_fsm:terminate/7 line 611
09:38:29.531 [error] Supervisor {<0.801.0>,poolboy_sup} had child riak_core_vnode_worker started with riak_core_vnode_worker:start_link([{worker_module,riak_core_vnode_worker},{worker_args,[0,[],worker_props,<0.798.0>]},{worker_callback_mod,...},...]) at undefined exit with reason lock_failure in bitcask_fileops:get_create_lock/2 line 106 in context shutdown_error
09:38:29.531 [error] gen_server <0.801.0> terminated with reason: lock_failure in bitcask_fileops:get_create_lock/2 line 106
09:38:29.531 [error] CRASH REPORT Process <0.801.0> with 0 neighbours exited with reason: lock_failure in bitcask_fileops:get_create_lock/2 line 106 in gen_server:terminate/6 line 747

The node sits in a sad loop, never recovering.

see also: #99 (comment)

This is kind of a dup, except that it's more specific, I guess. As I noted in the other bug, I don't see a good way to solve this without adding a lot of logic that I think should be handled at a higher level. I'll open an issue against riak that details what I think should happen.

The Erlang VM doesn't provide a file I/O interface to any of the common UNIX/POSIX lock-related functions: fcntl(2), flock(2), or lockf(3). So Bitcask is trying to do something sane that doesn't rely on them.

Putting one of them into the Bitcask NIF is possible, but it introduces headaches for portability across UNIX variants. And none of them offer the exact same semantics with the others.

Note that @bsparrow435's case of a zero-byte file can happen when a file system fills: there's enough room to create the file but not enough room to store any data.

Can we use rename to implement this? Write the file to some temp location, and iff the write succeeds do we move it to the lockfile'd file name?

rename(2) won't work as it overwrites the destination.

I'd do it as

create temporary file
write contents of lock to temp file
close temp file
attempt to hard link(2) temp file to lock file
on failure, retry as many times as you want
on success, unlink(2) temp file

Jon

On Wed, Apr 16, 2014 at 9:04 AM, Reid Draper notifications@github.comwrote:

Can we use rename to implement this? Write the file to some temp
location, and iff the write succeeds do we move it to the lockfile'd file
name?

Reply to this email directly or view it on GitHubhttps://github.com//issues/163#issuecomment-40609737
.

Jon Meredith
VP, Engineering
Basho Technologies, Inc.
jmeredith@basho.com

Seems reasonable to me

We already have a locking implementation; it isn't the best, but unless I
misunderstand the situation, improving it won't fix this particular
problem. Please see basho/riak#535 and its
associated links.

I was just saying the locking algorithm we have is racy, and possibly
giving rise to the

2014-04-16 07:00:59.943 [error] <0.23170.542> Failed to read lock data from
/data/riak/riak-data/riak/bitcask/456719261665907161938651510223838443642478919680/bitcask.create.lock:
{invalid_data,<<>>}

(harmless) messages we're seeing on a customer system.
As you say, basho/riak#535 is worth discussing.

On Wed, Apr 16, 2014 at 10:14 AM, Evan Vigil-McClanahan <
notifications@github.com> wrote:

We already have a locking implementation; it isn't the best, but unless I
misunderstand the situation, improving it won't fix this particular
problem. Please see basho/riak#535 and its
associated links.

Reply to this email directly or view it on GitHubhttps://github.com//issues/163#issuecomment-40619040
.

Jon Meredith
VP, Engineering
Basho Technologies, Inc.
jmeredith@basho.com

@evanmcc cool. I probably commented without understanding the issue well-enough. I just knew that rename(2) was atomic, so it might be another way of approaching the problem.

between #72 and basho/riak#535, I think that this issue is covered, closing.