Undefined behaviour and correctness discrepancy while parsing a BMP file
m-carrasco opened this issue ยท 7 comments
Hi CImg ๐
We found an undefined behaviour bug in CImg triggered when parsing a BMP file. The BMP is correct according to CImg; no exceptions or crashes when parsed without sanitisers. However, ImageMagick's identify
flags the BMP as incorrect.
After debugging, we understand this correctness discrepancy is triggering the undefined behaviour.
Bug
CImg.h:54462:39: runtime error: signed integer overflow: 1048616 - -2147483608 cannot be represented in type 'int'
The UB happens in function CImg<T>& _load_bmp(std::FILE *const file, const char *const filename)
and the line is:
const int xoffset = offset - 14 - header_size - 4*nb_colors;
offset
has a value of 1048630 and header_size
of -2147483608 while file size is 238.
Possible fix
offset
and header_size
could be compared to file_size
to check that they are within range.
If this sounds reasonable, I can provide a PR for it.
How to reproduce:
#include "CImg.h"
using namespace cimg_library;
int main(int argc,char **argv) {
const char *const file_i = cimg_option("-i",(char*)0,"Input image");
CImg<unsigned char> image(file_i);
return 0;
}
testcase.zip (Github does not allow uploading BMP files)
This has been tested with clang++-14.
unzip testcase.zip testcase.bmp
clang++ -g -O0 -fsanitize=undefined,address -fno-sanitize-recover=all -Dcimg_display=0 main.cpp
./a.out -i testcase.bmp
Discrepancy with identify
identify -verbose testcase.bmp
reports
identify-im6.q16: length and filesize do not match testcase.bmp @ error/bmp.c/ReadBMPImage/854.
identify-im6.q16: insufficient image data in file testcase.bmp @ error/bmp.c/ReadBMPImage/1001.
We have discovered additional BMP files triggering discrepancies with identify
, like above, meaning that CImg may wrongly consider them correct. Would you find it useful if we report them?
Thanks for sharing this project!
Best,
Manuel.
Thanks for reporting.
I've made a fix attempt : 55da1f4
Could you tell me if that looks good?
Thanks for the quick response!
I think a fix should take this into account:
- A malformed BMP can provide at least inconsistent values for the
offset
,header_size
andfile_size
variables.
Those variables can come from the input file's content. - A wrong combination of
offset
andheader_size
can lead to the reported UB.
To fix this, we could check if they are in range with the file's size.
I would propose a fix which performs these checks in this particular order:
- Sanitize
file_size
so it matches the actual file size. - Check that
offset
andheader_size
are consistent with the now sanitizedfile_size
.
From what I grasp, the fix is not sanitizing file_size
. Also, it attempts to sanitize header_size
using an unsanitized file_size
. Last, it is not checking offset
.
Here is my fix (on top of yours):
diff --git a/CImg.h b/CImg.h
index b5a934d..21db640 100644
--- a/CImg.h
+++ b/CImg.h
@@ -54419,6 +54419,11 @@ namespace cimg_library {
const ulongT fsiz = file?(ulongT)cimg_max_buf_size:(ulongT)cimg::fsize(filename);
std::FILE *const nfile = file?file:cimg::fopen(filename,"rb");
+
+ cimg::fseek(nfile,0,SEEK_END);
+ cimg_long filesystem_size=cimg::ftell(nfile);
+ std::rewind(nfile);
+
CImg<ucharT> header(54);
cimg::fread(header._data,54,nfile);
if (*header!='B' || header[1]!='M') {
@@ -54440,17 +54445,25 @@ namespace cimg_library {
nb_colors = header[0x2E] + (header[0x2F]<<8) + (header[0x30]<<16) + (header[0x31]<<24),
bpp = header[0x1C] + (header[0x1D]<<8);
+ if (file_size != filesystem_size){
+ throw CImgIOException(_cimg_instance
+ "load_bmp(): Invalid file_size %i specified in filename '%s', expected %i.",
+ cimg_instance,
+ file_size,filename?filename:"(FILE*)", filesystem_size);
+ }
+
if (header_size<0 || header_size>file_size)
throw CImgIOException(_cimg_instance
"load_bmp(): Invalid header size %d specified in filename '%s'.",
cimg_instance,
header_size,filename?filename:"(FILE*)");
- if (!file_size || file_size==offset) {
- cimg::fseek(nfile,0,SEEK_END);
- file_size = (int)cimg::ftell(nfile);
- cimg::fseek(nfile,54,SEEK_SET);
- }
+ if (offset<0 || offset>file_size)
+ throw CImgIOException(_cimg_instance
+ "load_bmp(): Invalid offset %i specified in filename '%s'.",
+ cimg_instance,
+ offset,filename?filename:"(FILE*)");
+
if (header_size>40) cimg::fseek(nfile,header_size - 40,SEEK_CUR);
const int
From what I tested, this fixes the UB and should also fix the first reported discrepancy with identify
.
The second discrepancy with ImageMagick seems to be about the image pixels' data. ImageMagick ensures that the total number of pixels read is consistent with the width and height.
I could also try to provide a patch if you like.
Best,
Manuel.
Thanks.
Here is my proposal then :
Variable fsize
actually already estimates the file size, with the std::tell()
procedure.
Thanks!
It looks good to me for the case in which a file path is provided instead of a FILE*
.
The issue is with fsiz
is that it is computed as const ulongT fsiz = file?(ulongT)cimg_max_buf_size:(ulongT)cimg::fsize(filename);
.
If a file pointer is provided (file
is not NULL
), you could still have issues similar to those reported here.
fsiz
is set to cimg_max_buf_size
, which could be higher than the actual size, thus allowing incorrect values for offset
and header_size
again. cimg_max_buf_size
seems high enough to allow the UB to happen again.
Best,
Manuel.
Yes, it looks good to me in regard to the UB ๐
A minor comment would be that the new fsize is not restoring the original file position. In our case, this is not an issue but it may be if used elsewhere.
I would save the position at the beginning long originalPos = std::ftell(file);
and restore before returning std::fseek(file, originalPos, SEEK_SET);
Best,
Manuel.
Indeed, thanks ! I've made the changes.
Regards,
David.