CAS-CLab/quantized-cnn

Not able to read binary file

navneet1083 opened this issue · 13 comments

I am not able to read binary file which are in Alexnet/Bin.Files directory, which i downloaded from the url mentioned in README. Getting following error:

static bool FileIO::ReadBinFile(const string&, Matrix*) [with T = float; std::__cxx11::string = std::__cxx11::basic_string]: Assertion `rtnVal == dataCntInBuffer' failed.

It would be great help if you could tell me the cause for this.

Hi @navneet1083

Could you please run the "ls -R" command in your main directory of "quantized-cnn", and post the result here? It seems to be some problem with failed-to-open files.

$ ls -R
.:
AlexNet bin Bmp.Files Cls.Names cmake-build-debug cpplint.py ILSVRC12.227x227.IMG images include Makefile Makefile.bkp Makefile.native Makefile.noblas obj README.md src

./AlexNet:
Bin.Files imagenet_mean.single.bin

./AlexNet/Bin.Files:
bkp_files_earlier bvlc_alexnet_aCaF.asmtLst.13.cbn bvlc_alexnet_aCaF.biasVec.05.bin bvlc_alexnet_aCaF.biasVec.19.bin bvlc_alexnet_aCaF.ctrdLst.11.bin temp
bvlc_alexnet_aCaF.asmtLst.01.cbn bvlc_alexnet_aCaF.asmtLst.16.cbn bvlc_alexnet_aCaF.biasVec.09.bin bvlc_alexnet_aCaF.biasVec.22.bin bvlc_alexnet_aCaF.ctrdLst.13.bin
bvlc_alexnet_aCaF.asmtLst.05.cbn bvlc_alexnet_aCaF.asmtLst.19.cbn bvlc_alexnet_aCaF.biasVec.11.bin bvlc_alexnet_aCaF.ctrdLst.01.bin bvlc_alexnet_aCaF.ctrdLst.16.bin
bvlc_alexnet_aCaF.asmtLst.09.cbn bvlc_alexnet_aCaF.asmtLst.22.cbn bvlc_alexnet_aCaF.biasVec.13.bin bvlc_alexnet_aCaF.ctrdLst.05.bin bvlc_alexnet_aCaF.ctrdLst.19.bin
bvlc_alexnet_aCaF.asmtLst.11.cbn bvlc_alexnet_aCaF.biasVec.01.bin bvlc_alexnet_aCaF.biasVec.16.bin bvlc_alexnet_aCaF.ctrdLst.09.bin bvlc_alexnet_aCaF.ctrdLst.22.bin

./AlexNet/Bin.Files/temp:
AlexNet.extra AlexNet.extra.tar.gz

./AlexNet/Bin.Files/temp/AlexNet.extra:
bvlc_alexnet_aCaF.convKnl.01.bin bvlc_alexnet_aCaF.convKnl.09.bin bvlc_alexnet_aCaF.convKnl.13.bin bvlc_alexnet_aCaF.fcntWei.19.bin
bvlc_alexnet_aCaF.convKnl.05.bin bvlc_alexnet_aCaF.convKnl.11.bin bvlc_alexnet_aCaF.fcntWei.16.bin bvlc_alexnet_aCaF.fcntWei.22.bin

./bin:
QuanCNN

./Bmp.Files:
ILSVRC2012_val_00000001.BMP ILSVRC2012_val_00000003.BMP ILSVRC2012_val_00000005.BMP ILSVRC2012_val_00000007.BMP ILSVRC2012_val_00000009.BMP
ILSVRC2012_val_00000002.BMP ILSVRC2012_val_00000004.BMP ILSVRC2012_val_00000006.BMP ILSVRC2012_val_00000008.BMP ILSVRC2012_val_00000010.BMP

./Cls.Names:
class_names.txt image_labels.txt

./cmake-build-debug:
CMakeFiles

./cmake-build-debug/CMakeFiles:
clion-log.txt

./ILSVRC12.227x227.IMG:
dataMatTst.single.bin lablVecTst.uint16.bin

./include:
bitmap_image.hpp BlasWrapper.h BmpImgIO.h CaffeEva.h CaffeEvaWrapper.h CaffePara.h Common.h FileIO.h Inference.h Matrix.h StopWatch.h UnitTest.h

./obj:
BlasWrapper.o BmpImgIO.o CaffeEva.o CaffeEvaWrapper.o CaffePara.o Inference.o Main.o UnitTest.o

./src:
BlasWrapper.cc BmpImgIO.cc CaffeEva.cc CaffeEvaWrapper.cc CaffePara.cc Inference.cc Main.cc UnitTest.cc

Which version of AlexNet are you running, original or quantized? For the original AlexNet, you need to put all the *convKnl* and *fcntWei* files under the ./AlexNet/Bin.Files directory.

P.S.: If this error occurs when you call the UnitTest::UT_CaffePara() function, it is okay since this function converts the *.bin to *.cbn for all *asmtLst* files. The former ones are not provided in this repository to save space, and is not required when you run the quantized AlexNet.

I am running UnitTest::UT_CaffeEva(); and trying to use Alexnet quantised one. Can i train on new set of BMP images and use that model for inference? If not, could you please share the models which u have trained on (like imagenet, cifar, VOC etc).

I trained the quantized AlexNet on a subset of ImageNet, and the resulting model is as provided in the repository. I recommend you to print out the file path in each call to FileIO::ReadBinFile(), and find out which file is not correctly loaded.

I have attached the log files.

log.log ===> if kEnblAprxComp = true

log2.log ===> if kEnblAprxComp = false (getting segmentation fault @ExecForwardPass)

https://www.dropbox.com/sh/muenm5ou8hsuu12/AAAp2NIFXtWRPL5i-APgaMxwa?dl=0

Have i missed some bin file ?

It seems to be something wrong with the *asmtLst* files. Can you post the result of running these two commands?

  1. ls -l ./AlexNet/Bin.Files/*asmtLst*
  2. md5sum ./AlexNet/Bin.Files/*asmtLst*

$ ls -l ./AlexNet/Bin.Files/asmtLst
-rw-rw-r-- 1 navneet navneet 8 Apr 4 14:42 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.01.cbn
-rw-rw-r-- 1 navneet navneet 8 Apr 4 14:42 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.05.cbn
-rw-rw-r-- 1 navneet navneet 8 Apr 4 14:42 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.09.cbn
-rw-rw-r-- 1 navneet navneet 8 Apr 4 14:42 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.11.cbn
-rw-rw-r-- 1 navneet navneet 8 Apr 4 14:42 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.13.cbn
-rw-rw-r-- 1 navneet navneet 8 Apr 4 14:42 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.16.cbn
-rw-rw-r-- 1 navneet navneet 8 Apr 4 14:42 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.19.cbn
-rw-rw-r-- 1 navneet navneet 8 Apr 4 14:42 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.22.cbn

$ md5sum ./AlexNet/Bin.Files/asmtLst
4d77cba7579dbbe93f514f58b03da564 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.01.cbn
4d77cba7579dbbe93f514f58b03da564 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.05.cbn
4d77cba7579dbbe93f514f58b03da564 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.09.cbn
4d77cba7579dbbe93f514f58b03da564 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.11.cbn
4d77cba7579dbbe93f514f58b03da564 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.13.cbn
4d77cba7579dbbe93f514f58b03da564 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.16.cbn
4d77cba7579dbbe93f514f58b03da564 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.19.cbn
4d77cba7579dbbe93f514f58b03da564 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.22.cbn

Your *.cbn files are corrupted. Please use the original files provided in the repository. Below is my md5sum outputs:

jxwu@jxwu-titan:~/quantized-cnn$ md5sum ./AlexNet/Bin.Files/*asmtLst*
4073a72d3281902e322b7e3d24b3f749 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.01.cbn
36dfcc12ef862a126a52c59b6ba7f5b2 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.05.cbn
3fb4550b869e8cb745ec4511ba43defc ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.09.cbn
f5925d47112269b73cda5dcd4f3358bc ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.11.cbn
67f6904e7dd81c0bebf3d2b5d4c176fe ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.13.cbn
0a598dbffb6e39c7a7ec7d99f2fe9033 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.16.cbn
87b905382fda5a2c5b2e0c7fa13ad529 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.19.cbn
bdaf5b10b632bb79f172e068527809d6 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.22.cbn

After changing file, still getting the segmentation fault.

$ md5sum ./AlexNet/Bin.Files/asmtLst
4073a72d3281902e322b7e3d24b3f749 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.01.cbn
36dfcc12ef862a126a52c59b6ba7f5b2 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.05.cbn
3fb4550b869e8cb745ec4511ba43defc ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.09.cbn
f5925d47112269b73cda5dcd4f3358bc ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.11.cbn
67f6904e7dd81c0bebf3d2b5d4c176fe ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.13.cbn
0a598dbffb6e39c7a7ec7d99f2fe9033 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.16.cbn
87b905382fda5a2c5b2e0c7fa13ad529 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.19.cbn
bdaf5b10b632bb79f172e068527809d6 ./AlexNet/Bin.Files/bvlc_alexnet_aCaF.asmtLst.22.cbn

$ ./bin/QuanCNN > log.log
Segmentation fault (core dumped)

log file - https://www.dropbox.com/s/nn3v7pzgq2pt5tw/log%20%281%29.log?dl=0

Could you please debug the program with gdb, and post the call stack where the segmentation fault took place?

P.S.: I re-ran the code just now, and no segmentation fault happened. You may try on a clean repository fetched from the GitHub, and only put the "dataMatTst.single.bin" file into the "./ILSVRC12.227x227.IMG" directory, and see whether there is any problem.

thanks @jiaxiang-wu. could you please provide the steps involved in training new dataset. (like if i would like to train for full imagenet2016, how can i do it?). Also, is it possible to train for multi-lable-classes like for VOC-dataset (with roi and bonding boxes for multiple objects in an image).

  1. You need a pre-trained ConvNet, and follow the instruction in Section 5.2.3 of our CVPR 16' paper. The code for training a quantized network is not provided, so you need to implement by yourself.
  2. Single-label and multi-label classification are both okay. The quantized network aims at approximating the original network's output in each layer, regardless of the specific form (or meaning) of that output.