pickle.dump gives SystemError

Question

pickle.dump gives SystemError

xdvx opened this issue 8 years ago · 19 comments

picke.dump gives me SystemError if I try larger batch of images. Thought it is memory issue, tried same batch on 32GB ram machine, same error.

Answer 1 · 2017-02-27T16:36:22.000Z

Thanks for report! Could you please provide more details? What are you trying to run - pcr.py or nnpcr.py? Are you trying to train a new model? What exact SystemError do you have (please attach full exception trace)? How large is your training set?

Answer 2 · 2017-02-27T16:56:37.000Z

I got this running nnpcr.py

This is an error message:
pickle.dump(obj, open(fileName + '.tmp', 'wb'), -1) SystemError: error return without exception set

I had 10k positive / 10k negative images. While googling I've found only one solution to run same script on Python 3. Took me awhile but no error message anymore.

I have one more question, I see that this is set to constant number:
def train(self, numIterations=1500):

Should this represent the size of training set?

Answer 3 · 2017-02-27T17:16:41.000Z

Seems like this dataset is too large to fit into in-memory cache. You can comment line 114 (saveCache((trainX, trainY, testX, testY), 'nncache.bin')) - this cache is only used to improve speed if running train multiple times.

I have one more question, I see that this is set to constant number:
def train(self, numIterations=1500):

Should this represent the size of training set?

No (at least directly). But you can try to set different numbers here. If numIterations too little or too large - the accuracy will be poor. 1500 was optimal for my training set (3K images total).

Answer 4 · 2017-02-27T17:20:40.000Z

BTW, what accuracy do you have with your dataset? Could you share your model (or how you gather it)?

Answer 5 · 2017-02-27T17:32:26.000Z

Didn't get over 80% yet. Looking for better dataset. Basically I aim to train to recognize photos which are not suitable for advertisement networks. So even minor nudity is not accepted here.

Should I train it only with people photos or should I provide with all kind of different samples as negative ones? Do photo dimensions matter or I can collect photos with smaller resolution? At final stage I hope I could use this script to go through 20 million photos on my website and mark which ones are not suited for showing advertisements.

I am collecting data samples by crawling reddit. So I could gather even huge datasets like hundrends of thousands images.

Answer 6 · 2017-02-27T17:47:41.000Z

Should I train it only with people photos or should I provide with all kind of different samples as negative ones?

Not only people - just arbitrary images. The more - the better.

Do photo dimensions matter or I can collect photos with smaller resolution?

Currently all images converted to 128x128. You could try smaller one, they should will be upscaled.

I am collecting data samples by crawling reddit. So I could gather even huge datasets like hundrends of thousands images.

Crawling reddit is a good idea, may be I'll try too later.

Answer 7 · 2017-02-27T17:55:02.000Z

Would it be smart to double image size? Would it be enough to change this constant?
IMG_SIZE = 128
to
IMG_SIZE = 256

Must positive and negative data samples be at same size?

Answer 8 · 2017-02-27T18:31:09.000Z

Would it be smart to double image size? Would it be enough to change this constant?
IMG_SIZE = 128
to
IMG_SIZE = 256

It's not that simple - you also need to change architecture of neural network - eg. add additinal conv & max_pool layers.

Must positive and negative data samples be at same size?

Yep.

Answer 9 · 2017-02-28T10:40:00.000Z

def train(self, numIterations=1500):

Is this enough for batch of 60K of training data?

Answer 10 · 2017-02-28T12:25:31.000Z

Not sure. Try to increase to 3K and check if quality is better or worse than with 1.5k.

Answer 11 · 2017-02-28T12:43:36.000Z

It really takes lot of ram. I could only run 40k sample on 32gb ram. Will try 80k on 64gb tomorrow. My model size always end up being 13mb, is this right? I still can't get accuracy over 80%.

Answer 12 · 2017-02-28T21:49:50.000Z

It really takes lot of ram. I could only run 40k sample on 32gb ram.

Seems like it currently not optimized for large datests. Currently it is loading the whole dataset in-memory, need to fix working with dataset.

My model size always end up being 13mb, is this right? I still can't get accuracy over 80%.

Current network architecture is not very complex - may be you should try more complicated architectures, eg Inception. You can also try following:

tune iterations number
tune size of pre-outer layer (now 1024 - you can try to increase or decrease it)
tune size of channels for convolutional layers (6, 12, 24)
add additional layers
try 5x5 kernel instead 3x3 one (shape=[3, 3, ...] => shape=[5, 5, ...] )

Answer 13 · 2017-03-01T12:02:11.000Z

By the way in new TensorFlow libarry you have to call

init_ops.zeros_initializer

like this, otherwise you will get error
init_ops.zeros_initializer()

I'll test with all those different parameters.

Also I get these warnings:

Use tf.losses.softmax_cross_entropy instead. [2017-03-01 13:01:03,853 deprecation.py:116 WARNING] From nnpcr.py:216: softmax_cross_entropy (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.softmax_cross_entropy instead. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:394: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.compute_weighted_loss instead. [2017-03-01 13:01:03,865 deprecation.py:116 WARNING] From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:394: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.compute_weighted_loss instead. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:151: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.add_loss instead. [2017-03-01 13:01:03,879 deprecation.py:116 WARNING] From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:151: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.

Answer 14 · 2017-03-01T13:23:58.000Z

Which line is for pre-outer layer?

Answer 15 · 2017-03-01T16:20:27.000Z

W_fc1 = tf.get_variable("W_fc1", shape=[8 * 8 * 24, 1024], initializer=xavier())

Answer 16 · 2017-03-01T16:22:20.000Z

By the way in new TensorFlow libarry you have to call init_ops.zeros_initializer

Haven't yet ported to a new version. Will do it soon.

Answer 17 · 2017-03-15T02:03:22.000Z

   x_image = tf.reshape(x, [-1, IMG_SIZE, IMG_SIZE, 3])     # 128

    W_conv1 = tf.get_variable("W_conv1", shape=[3, 3, 3, 64], initializer=xavier())
    b_conv1 = tf.get_variable('b_conv1', [1, 1, 1, 64])
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
    h_pool1 = max_pool_2x2(h_conv1)                             # 64

    W_conv2 = tf.get_variable("W_conv2", shape=[3, 3, 64, 128], initializer=xavier())
    b_conv2 = tf.get_variable('b_conv2', [1, 1, 1, 128])
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    h_pool2 = max_pool_2x2(h_conv2)                             # 32

    W_conv3 = tf.get_variable("W_conv3", shape=[3, 3, 128, 256], initializer=xavier())
    b_conv3 = tf.get_variable('b_conv3', [1, 1, 1, 256])
    h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
    h_pool3 = max_pool_2x2(h_conv3)                             # 16

    W_conv4 = tf.get_variable("W_conv4", shape=[3, 3, 256, 512], initializer=xavier())
    b_conv4 = tf.get_variable('b_conv4', [1, 1, 1, 512])
    h_conv4 = tf.nn.relu(conv2d(h_pool3, W_conv4) + b_conv4)
    h_pool4 = max_pool_2x2(h_conv4)                             # 8

    W_conv5 = tf.get_variable("W_conv5", shape=[3, 3, 512, 512], initializer=xavier())
    b_conv5 = tf.get_variable('b_conv5', [1, 1, 1, 512])
    h_conv5 = tf.nn.relu(conv2d(h_pool4, W_conv5) + b_conv5)
    h_pool5 = max_pool_2x2(h_conv5)                             # 4

    h_pool5_flat = tf.reshape(h_pool5, [-1, 4 * 4 * 512])

    W_fc1 = tf.get_variable("W_fc1", shape=[4 * 4 * 512, 4096], initializer=xavier())
    b_fc1 = tf.get_variable('b_fc1', [4096], initializer=init_ops.zeros_initializer())
    h_fc1 = tf.nn.relu(tf.matmul(h_pool5_flat, W_fc1) + b_fc1)

    keep_prob = tf.placeholder(tf.float32)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    W_fcO = tf.get_variable("W_fcO", shape=[4096, 2], initializer=xavier())
    b_fcO = tf.get_variable('b_fcO', [2], initializer=init_ops.zeros_initializer())

    logits = tf.matmul(h_fc1_drop, W_fcO) + b_fcO
    y_conv = tf.nn.softmax(logits)


    cross_entropy = loss_ops.softmax_cross_entropy(logits, y_)

    train_step = tf.train.AdamOptimizer(0.0005).minimize(cross_entropy)

    self.results = predictions = tf.argmax(y_conv, 1)

    self.probabilities = y_conv

    correct_prediction = tf.equal(predictions, tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

I got best results with this network. But accuracy didn't go over 93%. I ran 80 epochs.

It is giving me pretty good results in real life but I got curious to see its predictions. So I've checked what softmax is returning for me. And result wasn't good. It usually 1 or very close to one or 0. Shouldn't be like this. What do you think was my mistake? Neural network is too big for 128x128 pixels or 64k data batch is too small for such a big network?

Answer 18 · 2017-03-15T09:31:24.000Z

Sory, I can't understand what's the problem. 93% is rother good accuracy. All dataset is spliited into train (80%) and test (20%) one, accuracy is calculated over a test set. So if you have 93% accuracy - the same accuracy should be in real life too.
Or maybe you want to improve quality even more?

Answer 19 · 2017-03-15T12:22:04.000Z

No the problem I think is with what confident results softmax returns me. My guess is results are too dense. But I'm new to this field as you can see. Just learnt a lot in couple weeks. If I output what softmax returns me it outputs numbers around 1.00, 0.00 or 0.9999, 0.0001 something from those lines. I'm just curious why my neural network is so confident on results, should probability ever go to 100% and in most cases be between 99-100%?

One more question would be would it be practical to add second fully connected layer on top of fully connected layer add another dropout and only then retrieve 2 final classes.

I also played with different kernel sizes, didn't give me any effect, just slowed down my training.

AdamOptimizer gave me quite an improvement on results.

Just happy to share what I've learnt.