ned14/llfio

[question] mmap write only

jeremy-coulon opened this issue · 1 comments

Hi,

I am currently using linux memory mapping for caching remote files. I would like to know if I can replace my implementation with LLFIO (and maybe have linux/windows portability).

Overview of my process:

  1. mmap a cache file. already available pages are readonly. missing pages are write only.
  2. when we try to read some write-only pages, catch the signal
  3. download the missing pages from elsewhere
  4. mark the page readonly and signal the application that pages are now available

Reading through documentation and code, it seems that llfio can't mmap a file in write only?

I am currently mmap'ing my file this way on Linux:

std::size_t size = ...
int fd = ... // ::open() or ::shm_open()
std::byte* mapping = reinterpret_cast<std::byte*>(::mmap(nullptr, size, PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, 0));

Is it possible to have the same behavior with llfio?

ned14 commented

Obviously I'd advise in the strongest possible terms against using signals to be notified when a page is not in cache. Use any other way except that. POSIX signals are fraught with nasty surprise. Also, doing network i/o in random 4Kb lumps is high latency compared to alternative approaches.

Setting that aside, yes you're right that mapping a file for write in LLFIO also maps for read. The reason is portability, Linux is unusual in supporting write-only maps. LLFIO also doesn't expose any way for changing the read/write status of individual pages, all you can do is commit and decommit them which is analogous to allocate/deallocate. This is because the C abstract machine has no concept of memory having readability/writability separate to C's static typing.

A portable design of your solution would need to be less Linux-specific. You also want to avoid the implicit kernel transition when a page fault happens, Linux has just about the quickest implementation out there, so on any other platform you'd see performance hurt.

You may be aware already that there was a commercial product in the 1990s implementing exactly what you have implemented, except for C++ objects. This is why the early STL implemented Allocators with an indirecting memory model, and Bjarne designed early C++ around a cache-on-demand memory model driven by page faults.