basho/bitcask

A truncated hintfile can cause expiration of a file to repeatedly fail

Closed this issue · 7 comments

A truncated hintfile causes the hintfile fold in the expiration logic to fail. It does not seem to be defaulting back to a data file fold like it is supposed to. Example logs:

2014-06-05 03:20:31.217 [error] <0.27283.8119> Hintfile 'DATA_ROOT/10836.bitcask.hint' contains pointer 18446744073709551615 3206534561 that is greater than total data size 15032468 
2014-06-05 03:20:31.217 [error] <0.27283.8119> Error folding keys for "DATA_ROOT/10836.bitcask.data": {trunc_hintfile,ok} 

The only mitigation is to remove the hintfile, then the default fold is correctly utilized.

hm. wondering if it isn't some other problem. That 'pointer' is huge, so
there's clearly some corruption there. Still, you're right, it should be
falling back. What version?

Observed on 1.4.8 and 1.4.9. Have not tested against develop or 2.0.

this case statement is the problem:

https://github.com/basho/bitcask/blob/1.6/src/bitcask_fileops.erl#L304-L323

that's super worrying, though, as it indicates that either has_valid_hintfile/1 is lying to us here:

https://github.com/basho/bitcask/blob/1.6/src/bitcask_fileops.erl#L304

or we're writing bad offsets into otherwise good hintfiles. Would love to see one of these invalid hintfiles if that's possible.

Interesting: with the new tombstone bit in hintfiles in 2.0, downgrading will make old Bitcask think the offset is too high. So this is the case we would hit on a downgrade, which I was hoping would be annoying but workable.

So, there are a couple of issues here:

  1. the end of the file is detected incorrectly, such that if the file is an exact multiple of the chunk size chosen by the fold, you'll get bad stuff, like here, with the fold_hintfile_loop falling into the second case.
  2. as stated, when you run into this case, it doesn't correctly fall back to the data file.

2 is pretty easy to fix, although I think that I want to alter the atoms so that they make more sense (it's actually a truncated data file that this code is supposed to be detecting).

1 is a bit harder, I'll have to think about it some. Hope to have a PR out tomorrow.

fixed in #172 & #173