rasbt/python-machine-learning-book-2nd-edition

Chap15 CNN code issues

sameervk opened this issue · 1 comments

Hi Sebastian,

I am working on the CNN code in Chapter 15 step by step and realised there are some discrepancies and issues.

In the function conv_layer, you define the dimension of input channel as [batch x width x height x channels_in]. However, the format tf.nn.conv2d accepts in 'NHWC' or 'NCHW'. Since the example images are 28 x 28, it shouldn't really matter atleast in this case.

Secondly, I have nvidia geforce gtx 960M and when training the model, I get a "ResourceExhaustedError".
resource_exhausted_error

So I changed the 'use_cudnn_on_gpu' parameter of conv2d to False and I get a 'UnimplementedError'.
unimplemented_error

So I used the code os.environ['CUDA_VISIBLE_DEVICES'] = "" to run on cpu and it runs fine. The training avg. loss after 20 epochs I get is 5.151 compared to 3.965 in your book. I checked the memory and there is a jump in the usage at the end of each epoch. In total it consumes around 2 gb RAM. Do you think this is the reason for the OOM ResourceExhaustedError. The nvidia card has a total dedicated memory of 2004 MB.
memory_while_training

Thank you
Sameer

rasbt commented

Hi there,

sorry about these issues. I think, like you said, this is due to the relatively small (2Gb) GPU memory. Things that could make it work nonetheless would be lowering the batch size (from 64 to 16). If this still is not enough, you can try to make the size of the fully connected layer smaller (right now, it is 1024, which is quite large). A fully connected layer is usually what takes up most of the memory.