pickle.dump gives SystemError
xdvx opened this issue · 19 comments
picke.dump gives me SystemError if I try larger batch of images. Thought it is memory issue, tried same batch on 32GB ram machine, same error.
Thanks for report! Could you please provide more details? What are you trying to run - pcr.py
or nnpcr.py
? Are you trying to train a new model? What exact SystemError do you have (please attach full exception trace)? How large is your training set?
I got this running nnpcr.py
This is an error message:
pickle.dump(obj, open(fileName + '.tmp', 'wb'), -1) SystemError: error return without exception set
I had 10k positive / 10k negative images. While googling I've found only one solution to run same script on Python 3. Took me awhile but no error message anymore.
I have one more question, I see that this is set to constant number:
def train(self, numIterations=1500):
Should this represent the size of training set?
Seems like this dataset is too large to fit into in-memory cache. You can comment line 114 (saveCache((trainX, trainY, testX, testY), 'nncache.bin')
) - this cache is only used to improve speed if running train multiple times.
I have one more question, I see that this is set to constant number:
def train(self, numIterations=1500):Should this represent the size of training set?
No (at least directly). But you can try to set different numbers here. If numIterations
too little or too large - the accuracy will be poor. 1500 was optimal for my training set (3K images total).
BTW, what accuracy do you have with your dataset? Could you share your model (or how you gather it)?
Didn't get over 80% yet. Looking for better dataset. Basically I aim to train to recognize photos which are not suitable for advertisement networks. So even minor nudity is not accepted here.
Should I train it only with people photos or should I provide with all kind of different samples as negative ones? Do photo dimensions matter or I can collect photos with smaller resolution? At final stage I hope I could use this script to go through 20 million photos on my website and mark which ones are not suited for showing advertisements.
I am collecting data samples by crawling reddit. So I could gather even huge datasets like hundrends of thousands images.
Should I train it only with people photos or should I provide with all kind of different samples as negative ones?
Not only people - just arbitrary images. The more - the better.
Do photo dimensions matter or I can collect photos with smaller resolution?
Currently all images converted to 128x128. You could try smaller one, they should will be upscaled.
I am collecting data samples by crawling reddit. So I could gather even huge datasets like hundrends of thousands images.
Crawling reddit is a good idea, may be I'll try too later.
Would it be smart to double image size? Would it be enough to change this constant?
IMG_SIZE = 128
to
IMG_SIZE = 256
Must positive and negative data samples be at same size?
Would it be smart to double image size? Would it be enough to change this constant?
IMG_SIZE = 128
to
IMG_SIZE = 256
It's not that simple - you also need to change architecture of neural network - eg. add additinal conv & max_pool layers.
Must positive and negative data samples be at same size?
Yep.
def train(self, numIterations=1500):
Is this enough for batch of 60K of training data?
Not sure. Try to increase to 3K and check if quality is better or worse than with 1.5k.
It really takes lot of ram. I could only run 40k sample on 32gb ram. Will try 80k on 64gb tomorrow. My model size always end up being 13mb, is this right? I still can't get accuracy over 80%.
It really takes lot of ram. I could only run 40k sample on 32gb ram.
Seems like it currently not optimized for large datests. Currently it is loading the whole dataset in-memory, need to fix working with dataset.
My model size always end up being 13mb, is this right? I still can't get accuracy over 80%.
Current network architecture is not very complex - may be you should try more complicated architectures, eg Inception. You can also try following:
- tune iterations number
- tune size of pre-outer layer (now 1024 - you can try to increase or decrease it)
- tune size of channels for convolutional layers (6, 12, 24)
- add additional layers
- try 5x5 kernel instead 3x3 one (shape=[3, 3, ...] => shape=[5, 5, ...] )
By the way in new TensorFlow libarry you have to call
init_ops.zeros_initializer
like this, otherwise you will get error
init_ops.zeros_initializer()
I'll test with all those different parameters.
Also I get these warnings:
Use tf.losses.softmax_cross_entropy instead. [2017-03-01 13:01:03,853 deprecation.py:116 WARNING] From nnpcr.py:216: softmax_cross_entropy (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.softmax_cross_entropy instead. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:394: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.compute_weighted_loss instead. [2017-03-01 13:01:03,865 deprecation.py:116 WARNING] From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:394: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.compute_weighted_loss instead. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:151: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.add_loss instead. [2017-03-01 13:01:03,879 deprecation.py:116 WARNING] From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:151: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Which line is for pre-outer layer?
W_fc1 = tf.get_variable("W_fc1", shape=[8 * 8 * 24, 1024], initializer=xavier())
By the way in new TensorFlow libarry you have to call init_ops.zeros_initializer
Haven't yet ported to a new version. Will do it soon.
x_image = tf.reshape(x, [-1, IMG_SIZE, IMG_SIZE, 3]) # 128
W_conv1 = tf.get_variable("W_conv1", shape=[3, 3, 3, 64], initializer=xavier())
b_conv1 = tf.get_variable('b_conv1', [1, 1, 1, 64])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1) # 64
W_conv2 = tf.get_variable("W_conv2", shape=[3, 3, 64, 128], initializer=xavier())
b_conv2 = tf.get_variable('b_conv2', [1, 1, 1, 128])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2) # 32
W_conv3 = tf.get_variable("W_conv3", shape=[3, 3, 128, 256], initializer=xavier())
b_conv3 = tf.get_variable('b_conv3', [1, 1, 1, 256])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
h_pool3 = max_pool_2x2(h_conv3) # 16
W_conv4 = tf.get_variable("W_conv4", shape=[3, 3, 256, 512], initializer=xavier())
b_conv4 = tf.get_variable('b_conv4', [1, 1, 1, 512])
h_conv4 = tf.nn.relu(conv2d(h_pool3, W_conv4) + b_conv4)
h_pool4 = max_pool_2x2(h_conv4) # 8
W_conv5 = tf.get_variable("W_conv5", shape=[3, 3, 512, 512], initializer=xavier())
b_conv5 = tf.get_variable('b_conv5', [1, 1, 1, 512])
h_conv5 = tf.nn.relu(conv2d(h_pool4, W_conv5) + b_conv5)
h_pool5 = max_pool_2x2(h_conv5) # 4
h_pool5_flat = tf.reshape(h_pool5, [-1, 4 * 4 * 512])
W_fc1 = tf.get_variable("W_fc1", shape=[4 * 4 * 512, 4096], initializer=xavier())
b_fc1 = tf.get_variable('b_fc1', [4096], initializer=init_ops.zeros_initializer())
h_fc1 = tf.nn.relu(tf.matmul(h_pool5_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fcO = tf.get_variable("W_fcO", shape=[4096, 2], initializer=xavier())
b_fcO = tf.get_variable('b_fcO', [2], initializer=init_ops.zeros_initializer())
logits = tf.matmul(h_fc1_drop, W_fcO) + b_fcO
y_conv = tf.nn.softmax(logits)
cross_entropy = loss_ops.softmax_cross_entropy(logits, y_)
train_step = tf.train.AdamOptimizer(0.0005).minimize(cross_entropy)
self.results = predictions = tf.argmax(y_conv, 1)
self.probabilities = y_conv
correct_prediction = tf.equal(predictions, tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
I got best results with this network. But accuracy didn't go over 93%. I ran 80 epochs.
It is giving me pretty good results in real life but I got curious to see its predictions. So I've checked what softmax is returning for me. And result wasn't good. It usually 1 or very close to one or 0. Shouldn't be like this. What do you think was my mistake? Neural network is too big for 128x128 pixels or 64k data batch is too small for such a big network?
Sory, I can't understand what's the problem. 93% is rother good accuracy. All dataset is spliited into train (80%) and test (20%) one, accuracy is calculated over a test set. So if you have 93% accuracy - the same accuracy should be in real life too.
Or maybe you want to improve quality even more?
No the problem I think is with what confident results softmax returns me. My guess is results are too dense. But I'm new to this field as you can see. Just learnt a lot in couple weeks. If I output what softmax returns me it outputs numbers around 1.00, 0.00 or 0.9999, 0.0001 something from those lines. I'm just curious why my neural network is so confident on results, should probability ever go to 100% and in most cases be between 99-100%?
One more question would be would it be practical to add second fully connected layer on top of fully connected layer add another dropout and only then retrieve 2 final classes.
I also played with different kernel sizes, didn't give me any effect, just slowed down my training.
AdamOptimizer gave me quite an improvement on results.
Just happy to share what I've learnt.