eellak/build-recorder

Regarding `sync(2)`, `syncfs(2)` `fsync(2)`,`fdatasync(2)`

Opened this issue · 0 comments

In my understanding:

Facts

  • sync(2)
    POSIX. Unlike POSIX though, Linux waits for I/O completion before returning.

sync() causes all pending modifications to filesystem metadata and
cached file data to be written to the underlying filesystems.

  • syncfs(2)
    POSIX fsync(2). But Linux specific has the same guarantee sync(2) does on Linux.

syncfs() is like sync(), but synchronizes just the filesystem
containing file referred to by the open file descriptor fd.

  • fsync(2)
    POSIX . Again, unlike POSIX though, on Linux the system call won't return unless the sync actually happens.

  • fdatasync(2)
    fsync(2) with lazy evaluation built in.

Proposal

Obviously what was mentioned above isn't the full picture, since for example it implies that syncfs(2) and fsync(2) are equivalent, which is far from true. But for our purposes, they are to be treated mostly the same.
So given the above statements are correct, my proposal is:

  • sync(2)
    We rehash all open files with pending write operations and store them as new files almost as if they were close(2)ed and re-open(2)ed again.

  • syncfs(2)
    We only rehash the open file identified by fd. Basically what i described for sync(2) but for a singular file.

  • fsync(2)
    Same as for syncfs(2)

  • fdatasync(2)
    This one is quite tricky since it doesn't flush the data unless they are to be read. This is a problem to us since we cannot possibly create the new file with the new hash upon encounter of this system call. Our hash checker that I proposed to be remove due to its very poor performance actually addresses this though. What's the chance a compiler uses any of this?

It's also worth mentioning in general that the hash checker addresses the problems this entire issue tries to solve by tracking the sync family of system calls. Since if one was to sync a file in any way described above, the hash upon the next read would be updated and a new file would be created as a result.

Oh the price we pay for performance.