Matfile created by MFL not readable by MATLAB/Octave (Error using load, Unable to read MAT-file, File might be corrupt)
Closed this issue · 6 comments
Hello,
I try to create matlab files with large matrices of double values.
I created mat file with two matrices of same length,
matrix1 with constant Values and matrix2 with variable Values.
From mat file size of 977 MB, opening the mat file in Matlab R2019b Update1 64 Bit (9.7.0.1216025) fails with
"Error using load, Unable to read MAT-file //mymatfile.mat, File might be corrupt."
Strangely, when I created mat file with two matrices of same length,
matrix1 and matrix2 both with variable Values the error does not occur.
I can even open 1,48 GB mat files in Matlab R2019b Update1 64 Bit (9.7.0.1216025) without an error.
I attached a java class that show both cases.
Any suggestions are welcome.
UPDATE:
when I change the saving method so I can define the deflate level
try(Sink sink = Sinks.newStreamingFile(FILENAME)){
Mat5.newWriter(sink)
.setDeflateLevel(Deflater.BEST_COMPRESSION)
.writeMat(matFile);
} catch (IOException e) {
e.printStackTrace();
}
I see following errors opening the mat file in Octave:
.setDeflateLevel(Deflater.NO_COMPRESSION)
error: load: invalid element type = 0
error: parse error
.setDeflateLevel(Deflater.BEST_COMPRESSION)
error: load: error uncompressing data element (buf error from zlib)
error: parse error
Thank you for reporting this. You are using version 0.5.4
, right?
Yes I am using Version 0.5.4.
While debugging MFL we found that it is possible to create a mat file with a num bytes size that exceeds Integer.MAX_VALUE (thus exceeding the 4 bytes of num bytes specified by the mat file standard) without getting an exception.
We only get an exception when reading the file because tag.getNumBytes() overflows the integer value and thus does not match the numBytes (at Mat5Reader.java:349)
Sorry for the confusion about constant values. vs. variable values that are obvioulsy not relevant for the problem.
To conclude: An exception would be helpful when MFL tries to write a file with numBytes>Integer.MAX_VALUE because that file will not be readable.
The size limit only exists for single entries, e.g., if you have 10x 2GB variables you can have a valid MAT file that is 20GB. I thought I checked all the relevant spots for over/under-flows, but apparently not.
I'll try to add a check. Thanks for doing the debugging.
Thank you for the reply and the hint.
The 2GB also applies to the root element "data" (in this case all variables in the file).
However, removing the root element and putting all variables on root level seems to bring advantages (n x 2GB).
Yes, we typically move all large matrices to the root level so we can have much larger size limits.
You can combine them back into a struct using load
, e.g., data = load('file')
will create a data
struct that contains all mat file entries. That's also nice for loading only subsets of the data, if needed.
added checks to 0.5.5