Issues with mmap files larger than 4GB on Windows 64bit operating systems
unbemannt opened this issue · 6 comments
First, thanks for making this libary available.
I am having issue with some Java test code using larray mmap. I am unable to mmap files larger than 4GB on Windows 64 bit operating systems while the same Java test code runs fine on Linux (30GB no problem). Maybe I am not using the API properly.
I'm getting a core dump with a 5GB file on windows. It almost feels as if the 32 bit larray native binary is loaded as the file limit appears to be around 4GB (4GB is okay, 5GB does not work).
System info:
Windows 10 Pro 64 or Ubuntu Linux 17.10 64bit (same machine, dual boot)
JDK 1.8u151
larray-buffer:0.4.1
larray-mmap:0.4.1
Example code:
package testmmap;
import xerial.larray.mmap.MMapBuffer;
import xerial.larray.mmap.MMapMode;
import java.io.File;
import java.io.IOException;
import java.util.concurrent.ThreadLocalRandom;
/**
* Creates a 4GB and 5GB file of random data, then reads entire file for min/max values.
*/
public class TestMmap {
private static void createLargeFile(File file, long bytes) throws IOException {
long t0 = System.currentTimeMillis();
System.out.println("\n\nCreating file = " + file);
MMapBuffer data = new MMapBuffer(file, 0, bytes, MMapMode.READ_WRITE);
for (long index=0; index<bytes; index+=8) {
long value = ThreadLocalRandom.current().nextLong(0, Long.MAX_VALUE);
data.putLong(index, value);
}
data.flush();
System.out.println("Time = " + (System.currentTimeMillis() - t0));
System.out.println("Size = " + data.size());
data.close();
}
private static void readLargeFile(File file) throws IOException {
long t0 = System.currentTimeMillis();
System.out.println("\n\nReading file = " + file);
MMapBuffer data = new MMapBuffer(file, MMapMode.READ_ONLY);
long bytes = data.size();
long min = Long.MAX_VALUE;
long max = Long.MIN_VALUE;
for (long index=0; index<bytes; index+=8) {
long value = data.getLong(index);
min = (value < min) ? value : min;
max = (value > max) ? value : max;
}
System.out.println("Time = " + (System.currentTimeMillis() - t0));
System.out.println("Size = " + bytes);
System.out.println("Min = " + min);
System.out.println("Max = " + max);
data.close();
}
public static void main(String[] args) throws Exception {
long byteSizes[] = {4000000000L,5000000000L};
for (long bytes : byteSizes) {
File file = new File(String.format("mmap%d.out", bytes));
if (!file.exists())
TestMmap.createLargeFile(file, bytes);
TestMmap.readLargeFile(file);
}
}
}
I also tried building jars locally on Ubuntu with g++-mingw-w64-x86-64 package installed. Cloned from master and created *-0.4.2-SNAPSHOT.jars, copied these over to windows and rebuilt the project. But same results, 4gb works, 5gb not working.
I ran "make win64" then ./sbt compile and ./sbt package to get the jars.
Thanks for reporting. Considering that it works in Ubuntu, this looks like a Windows-specific problem, and 4GB is 2^32 boundary, so it seems some larray code might be using invalid address when accessing mmap memory region over 4GB.
Windows specific code is around here:
larray/larray-mmap/src/main/java/xerial/larray/impl/LArrayNative.c
Lines 46 to 73 in 598607d
I'm not using Windows recently, so it will be difficult for me to address this issue soon. And it's great you can build larray by yourself. I guess tweaking the code around https://github.com/xerial/larray/blob/598607d2c7ec56b328bd856c6913f5d26773910f/larray-mmap/src/main/java/xerial/larray/mmap/MMapBuffer.java and the above native code part will be helpful to fix this issue.
Thanks
Finally got to take another look. I've traced it down to the MapViewOfFile call on line 70 in LArrayNative.c, the size parameter should be cast to (size_t) not (DWORD). Its working as expected now.
@unbemannt Good catch! Could you create a PR for this?
I hope, it solves the problem, thank you.
I also see, that WinAPI nicely accept zeros for file size: mapping size is fixed anyway. So, passing 0 to both CreateMapping and View is ok, according to the documentation.
I will be glad to test this or any other bugfixes. Luckily I have an access to a fat Windows Server with 1.5 TB RAM :)
But right now the problem still exists for a 5 GB file