basho/bitcask

bitcask:merge/1 keeps file descriptors of deleted files open [JIRA: RIAK-2814]

Closed this issue · 2 comments

I am using bitcask without riak with bitcasks default configuration. I noticed that file descriptors of deleted files are not closed whenever I call bitcask:merge/1.
I have tested this for Erlang OTP 18.1 and 17.5.

The following snipped fills bitcask with random binaries and calls merge:

Dir = "/run/shm/bitcask".
Cask = bitcask:open(Dir, [read_write]).
RandSize = fun() -> random:uniform(1020*1024) + (4 * 1024) end.
RandBin = fun() -> crypto:rand_bytes(RandSize()) end.
Fill = fun(From, To) -> [bitcask:put(Cask, integer_to_binary(X), RandBin()) || X <- lists:seq(From, To)] end.
[Fill(1, 3000) || _X <- lists:seq(1, 3)].
bitcask:merge(Dir).

lsof output after running the snipped:

bzcskrzy@cumu02-15:/run/shm/bitcask$ lsof -c beam | grep /run/shm
beam.smp 21255 bzcskrzy    8u   REG   0,21         38    119191 /run/shm/bitcask/bitcask.write.lock
beam.smp 21255 bzcskrzy   10u   REG   0,21 2147352145    115060 /run/shm/bitcask/1.bitcask.data (deleted)
beam.smp 21255 bzcskrzy   11u   REG   0,21 2146712969    119194 /run/shm/bitcask/2.bitcask.data (deleted)
beam.smp 21255 bzcskrzy   12u   REG   0,21  424816644    117958 /run/shm/bitcask/3.bitcask.data
beam.smp 21255 bzcskrzy   13u   REG   0,21      35596    119291 /run/shm/bitcask/3.bitcask.hint  

Repeatedly filling and merging bitcask will accumulate more and more such FDs, which are not released until bitcask:close/1 is called.

UPDATE:
It seems like I can work around this problem by manually retrieving the open files from bitcasks bc_state process dictionary and calling bitcask_fileops:close_all/1.

8> State = erlang:get(Cask).
{bc_state,"/run/shm/bitcask",
          {filestate,read_write,"/run/shm/bitcask/3.bitcask.data",3,
           <0.52.0>,<0.53.0>,3626079127,424816644,423940347,22,
                     2058954717},
          <<>>,
          [{filestate,read_only,"/run/shm/bitcask/2.bitcask.data",2,
                      <0.50.0>,undefined,0,2146712969,2145783270,22,1423779256},
           {filestate,read_only,"/run/shm/bitcask/1.bitcask.data",1,
                      <0.48.0>,undefined,0,2147352145,2146439343,22,2556499941}],
          2147483648,
          [{expiry_secs,-1},read_write],
          #Fun<bitcask.21.19258060>,<<>>,1,2}
9> Files = element(5, State).
[{filestate,read_only,"/run/shm/bitcask/2.bitcask.data",2,
            <0.50.0>,undefined,0,2146712969,2145783270,22,1423779256},
 {filestate,read_only,"/run/shm/bitcask/1.bitcask.data",1,
            <0.48.0>,undefined,0,2147352145,2146439343,22,2556499941}]
10> bitcask_fileops:close_all(Files).
ok

UPDATE 2:
I just noticed that bitcask:needs_merge/2 already closes file descriptors that were left open from a previous merge. This fixes my problem without relying on stupid hacks. But maybe this should be documented somewhere in case someone runs into the same problem.

Thanks,
Jan

This is fixed by also fixing #251. We need to bring NIF/Erlang mode into lock step with options passed into the open call.

[posted via JIRA by Brian Sparrow]

Addressed by adding O_CREAT flag to open_file in NIF mode and educating CSE's on the needs_merge logic which cleans up left over file descriptors.

[posted via JIRA by Brian Sparrow]