basho/bitcask

bitcask:init_keydir(): Avoid unnecessary wait by bitcask:poll_deferred_delete_queue_empty/0

Closed this issue · 3 comments

It is possible for Bitcask instance A's open(), which calls init_keydir(), to block for long periods of time while Bitcask instance B performs a long-running merge. In turn, such a long-blocking open() can make the riak_core_vnode_manager process very unhappy, since it is the process that starts new Riak vnode instances which then might open a Bitcask instance.

Below is a stack trace from a blocked Bitcask instance (instance A in the description above). The same cluster-info report shows that the bitcask_merge_delete server has a queue of hundreds of files from instance B to delete. The report shows that there's a 3rd instance which is doing a long-running fold (repair handoff) that is preventing the bitcask_merge_delete server from deleting B's old merge files.

https://github.com/basho/bitcask/blob/1.6.3-release/src/bitcask_merge_delete.erl#L109

One possible solution is to split the monolithic bitcask_merge_delete server into per-bitcask-instance servers?

=proc:<0.9668.5878>
State: Waiting
Spawned as: proc_lib:init_p/5
Spawned by: <0.247.0>
Started: Thu Mar 20 11:25:11 2014
Message queue length: 1
Message queue: [{'$gen_sync_event',{<0.9667.5878>,#Ref<0.0.4474.115775>},wait_for_init}]
Number of heap fragments: 0
Heap fragment data: 0
Link list: []
Dictionary: [{'$initial_call',{riak_core_vnode,init,1}},{'$ancestors',[riak_core_vnode_sup,riak_core_sup,<0.244.0>]},{random_seed,{26833,7398,23484}}]
Reductions: 128073
Stack+heap: 987
OldHeap: 4181
Heap unused: 683
OldHeap unused: 3540
Stack dump:
Program counter: 0x00007f8539fccc28 (bitcask:poll_deferred_delete_queue_empty/0 + 88)
CP: 0x0000000000000000 (invalid)
arity = 0

0x00007f83b256aeb8 Return addr 0x00007f8539fc9570 (bitcask:init_keydir/3 + 632)

0x00007f83b256aec0 Return addr 0x00007f8539fc3c78 (bitcask:open/2 + 568)
y(0)     <<0 bytes>>
y(1)     Catch 0x00007f8539fc9660 (bitcask:init_keydir/3 + 872)
y(2)     true
y(3)     <<0 bytes>>
y(4)     "/var/lib/riak/bitcask/702205864811332261480676696969151607100311339008"

0x00007f83b256aef0 Return addr 0x00007f853baef280 (riak_kv_bitcask_backend:start/2 + 1336)
y(0)     []
y(1)     []
y(2)     2147483648
y(3)     fresh
y(4)     [{data_root,"/var/lib/riak/bitcask"},{read_write,true}]
y(5)     "/var/lib/riak/bitcask/702205864811332261480676696969151607100311339008"

0x00007f83b256af28 Return addr 0x00007f853b1d4160 (riak_cs_kv_multi_backend:start_backend/4 + 152)
y(0)     []
y(1)     []
y(2)     [{data_root,"/var/lib/riak/bitcask"},{read_write,true}]
y(3)     "/var/lib/riak/bitcask"
y(4)     "702205864811332261480676696969151607100311339008"
y(5)     702205864811332261480676696969151607100311339008

0x00007f83b256af60 Return addr 0x00007f853b1d9208 (riak_cs_kv_multi_backend:'-start_backend_fun/1-fun-0-'/3 + 256)
y(0)     be_blocks
y(1)     riak_kv_bitcask_backend
y(2)     Catch 0x00007f853b1d4258 (riak_cs_kv_multi_backend:start_backend/4 + 400)

0x00007f83b256af80 Return addr 0x00007f855c674960 (lists:foldl/3 + 120)
y(0)     riak_kv_bitcask_backend
y(1)     []
y(2)     [{be_default,riak_kv_eleveldb_backend,{state,<<0 bytes>>,"/var/lib/riak/leveldb/702205864811332261480676696969151607100311339008",[{create_if_missing,true},{max_open_files,50},{use_bloomfilter,true},{write_buffer_size,59927635}],[{create_if_missing,true},{data_root,"/var/lib/riak/leveldb"},{included_applications,[]},{max_open_files,50},{use_bloomfilter,true},{write_buffer_size,59927635}],[],[],[{fill_cache,false}],true,false}}]
y(3)     Catch 0x00007f853b1d9320 (riak_cs_kv_multi_backend:'-start_backend_fun/1-fun-0-'/3 + 536)

0x00007f83b256afa8 Return addr 0x00007f853b1d3e90 (riak_cs_kv_multi_backend:start/2 + 808)
y(0)     #Fun<riak_cs_kv_multi_backend.1.32308858>
y(1)     []

0x00007f83b256afc0 Return addr 0x00007f853ae86778 (riak_kv_vnode:init/1 + 872)
y(0)     be_default
y(1)     [{<<3 bytes>>,be_blocks}]
y(2)     []
y(3)     []

0x00007f83b256afe8 Return addr 0x00007f853ae6d220 (riak_core_vnode:do_init/1 + 248)
y(0)     Catch 0x00007f853ae86778 (riak_kv_vnode:init/1 + 872)
y(1)     3000
y(2)     <<8 bytes>>
y(3)     10
y(4)     100
y(5)     100
y(6)     1000
y(7)     true
y(8)     riak_cs_kv_multi_backend
y(9)     702205864811332261480676696969151607100311339008

0x00007f83b256b040 Return addr 0x00007f853ae6ce28 (riak_core_vnode:started/2 + 200)
y(0)     []
y(1)     []
y(2)     []
y(3)     []
y(4)     []
y(5)     []
y(6)     []
y(7)     []
y(8)     []
y(9)     []
y(10)    undefined
y(11)    702205864811332261480676696969151607100311339008
y(12)    riak_kv_vnode
y(13)    {state,702205864811332261480676696969151607100311339008,riak_kv_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,undefined,0}

0x00007f83b256b0b8 Return addr 0x00007f853ae64e98 (gen_fsm:handle_msg/7 + 224)
y(0)     0

0x00007f83b256b0c8 Return addr 0x00007f855c6675b8 (proc_lib:init_p_do_apply/3 + 56)
y(0)     undefined
y(1)     Catch 0x00007f853ae64e98 (gen_fsm:handle_msg/7 + 224)
y(2)     riak_core_vnode
y(3)     {state,702205864811332261480676696969151607100311339008,riak_kv_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,undefined,0}
y(4)     started
y(5)     <0.9668.5878>
y(6)     <0.247.0>
y(7)     {'$gen_event',timeout}

0x00007f83b256b110 Return addr 0x0000000000847c38 (<terminate process normally>)
y(0)     Catch 0x00007f855c6675d8 (proc_lib:init_p_do_apply/3 + 88)

See also: #95

Why do we even wait on the deferred delete queue to be empty? It seems that that should be entirely asynchronous.

Staring at it some more, I think that since we don't load setuid-tagged files anymore, we can just remove the polling and merge 95.

The original commit adding the polling is here: edfa3aa

Since merge should no longer ever call that function, I think that it's safe to remove it.

fixed by #150, closing.