Build breaks on BSD
Closed this issue · 17 comments
In file included from /usr/ports/math/oink/work/oink-0eadb9ef3dd1927c955b449c3270af2db2577abf/src/game.hpp:26:
/usr/ports/math/oink/work/oink-0eadb9ef3dd1927c955b449c3270af2db2577abf/src/bitset.hpp:132:77: error: use of undeclared identifier 'MREMAP_MAYMOVE'
_bits = (uint64_t*)mremap(_bits, _allocsize, new_allocsize, MREMAP_MAYMOVE);
^
There's no mremap
on FreeBSD.
Why don't you just use malloc/realloc/free? They also use mmap but are much more portable.
You can use the google-perftools library to allocate memory too.
I use mmap
because it lets me allocate a lot of virtual memory without allocating and zeroing real memory pages. This way, I can preallocate say 1 TB of memory, then use only as much as I need. I don't think malloc/realloc/free lets me do the same.
malloc
doesn't zero memory, and C++ operator new[] also doesn't zero memory.
What about performance hits when using realloc
? Currently I don't need to resize the array as often because I overallocate anyway, but if I use malloc
and realloc
I imagine I will need to use realloc
much more often in order to avoid allocating significantly more memory than I actually need.
Both malloc and realloc allocate only virtual memory, exactly like mmap does. In fact, they allocate memory using mmap.
Okay so you're saying that I can just use malloc where I currently use mmap, etc, and will not see a performance hit from that?
Yes, because malloc and realloc are implemented through mmap for large memory blocks.
By the way, I forgot to mention that mmap
zeros the memory lazily, or at least that is the intention. If malloc
has the same behavior, cool. The idea is that the allocated bytes are definitely all initialized to zero, but as long as I don't access the entire array, the memory pages I don't access are also not initialized. Does malloc
have the same behavior?
Ah right, I thought malloc
did not do this lazy zero. So if I allocate large enough blocks, will malloc
automatically zero the memory without extra effort from me?
malloc's effect is pretty much the same as mmap's because malloc is implemented through mmap. There's simply no other way to allocate memory than through mmap.
Okay, so if I allocate say 1024 bytes using malloc
and then access the array, are all bytes initialized to zero like when I use mmap
?
Maybe a better example would be allocating 1 megabyte, and I access say byte 4096, the second memory page ordinarily. The kernel will find a page fault, allocate a real page and assign it to a virtual page, and zero the memory. As far as I knew, malloc
does not guarantee this behavior.
mmap can't guarantee more than malloc does.
According to the man pages, in Linux mmap
gives this guarantee when allocating anonymous memory. Hence, in Linux, I can use mmap
to allocate memory that is guaranteed to be initialized to zero when used, but that will not actually allocate real pages until I touch those bytes. That is the behavior I'm looking for.
To do this with malloc
and realloc
, I would need to zero memory manually, because I would have no idea whether memory is zero or not.
Maybe BSD's mmap
doesn't give the guarantee?
For similar reasons I cannot build oink on macOS. Do you confirm macOS is not supported @trolando?
@trolando malloc
provides basically the same functionality as mmap in a more system-independent and standard way.
realloc
also calls mremap
or other available on the system functions.
malloc
and operator new
do not zero or access memory when they allocate it, so virtual to physical mapping works basically in the same way as with mmap
.
OK my progress on this so far is that I figured out that Oink will probably have to take a performance hit for this. If I allocate a large block of memory with malloc
then the memory is not always zero-ed out. So I will then need to allocate using calloc
. Because mmap
gives a guarantee that memory is zero, but it will lazily set memory to zero when used. I just made a version where I use malloc
and realloc
and it just crashes because not all the memory is zero when accessed.
Furthermore, there seem no guarantees to the alignment. So the parallel code will not optimally access cache lines.
I guess I'll just have to choose between OSX/BSD support and a high performance library.