ned14/llfio

Hole punching in tests

Opened this issue · 5 comments

bigerl commented

Going through the tests I didn't find any tests for copying files with holes.
The only thing I found for for file copying was this:

static inline void TestCloneOrCopyFileWhole()

Is the code for copying with holes being tested elsewhere? Or is there a reason why it doesn't need to be tested?

ned14 commented

The test you identified is for testing that function.

Contribution of better tests would be welcome.

bigerl commented

Many thanks for the quick reply. In our team we do not have capacity to work on that ATM. We still need to evaluate if using llfio makes sense for our use case. If so, we would probably make a PR with extended testing of file copying.

ned14 commented

I can tell you that function copies Terabytes of data every day in 99.999% uptime cloud systems.

It doesn't mean you wouldn't find a bug when you use it, but for how that system uses that function, it has been extremely reliable.

bigerl commented

That is good to know and I read about it in the readme. I really like the library and that it safes you from a lot of error prone fiddling with OS dependent sys calls for file handling.
It's just because we make heavy use of hole punching so we want to make sure that the implementation we use is correct with regards to copying files with holes and also covers all corner cases that could happen.

ned14 commented

To my best knowledge the hole punching extents cloning implementation is correct. Implementation of something correct in all corner cases whilst retaining performance and is semantically equivalent across all platforms is surprisingly hard. In fact, in all my career, I'd rank it in the top five hardest things I've ever implemented. It really is quite challenging, and it took me some months of outside-of-work time to get it debugged.

You should be aware that holes don't clone exactly, and there is absolutely nothing anybody can do about it. The code will enumerate the holes in the source, and exactly replicate that in the destination. However afterwards the holes won't match!

I am absolutely certain it isn't my code, so therefore it must be some internal property of the filing system. For large holes, a few Kb one side or another makes no difference, however I had a test file which consisted of a 4Kb allocated region with about 100 Kb of randomly varying hole between them. Most of the holes copied over exactly, some however became bigger, others become smaller, and in a few cases they disappeared.

The test program checked that the contents of the files were identical, and they were. I had the code write out exactly which extents were copied to a log. I examined and compared, and what my code tells the kernel is simply not always honoured.

What my code doesn't do but other hole copying file copying tools do is scan either side of an allocated extent for non zero bytes to "trim" the allocated region down to minimum. My code doesn't do this to be fast, it adds up to a good chunk of performance on fast SSDs.

So consider this a "reasonable best effort" hole cloning implementation. You can if you want always poke in zero byte region trimming yourself if you needed it.