Why we have to rescale by 1. / 255

Question

Why we have to rescale by 1. / 255

anhlt opened this issue 8 years ago · 5 comments

train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
horizontal_flip=True)

I saw this process in Keras blog, and your implementation. But I don't understand why?


def preprocess_input(x, dim_ordering='default'):
    if dim_ordering == 'default':
        dim_ordering = K.image_dim_ordering()
    assert dim_ordering in {'tf', 'th'}

    if dim_ordering == 'th':
        x[:, 0, :, :] -= 103.939
        x[:, 1, :, :] -= 116.779
        x[:, 2, :, :] -= 123.68
        # 'RGB'->'BGR'
        x = x[:, ::-1, :, :]
    else:
        x[:, :, :, 0] -= 103.939
        x[:, :, :, 1] -= 116.779
        x[:, :, :, 2] -= 123.68
        # 'RGB'->'BGR'
        x = x[:, :, :, ::-1]
return x

this is preprocess step on imagenet_util.py, It just minus the mean value of each dimension.

Answer 1 · 2016-10-12T15:43:19.000Z

Hi @anhlt!
Rescale is a value by which we will multiply the data before any other processing. Our original images consist in RGB coefficients in the 0-255, but such values would be too high for our model to process (given a typical learning rate), so we target values between 0 and 1 instead by scaling with a 1/255. factor (the description taken from https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html)

Talking about preprocess_input function, I don't know on which step it can/should be used. I've tried to train model with extracting those mean values and converting to BGR, but then model does not train at all and stack on a very low accuracy (near 0.0387 for each epoch). Then I've tried it on prediction step - results still bad. Also I've noticed that this function is not used inside of Keras.

About mean value here's interesting info - Lasagne/Recipes#20

Answer 2 · 2016-10-13T03:11:41.000Z

@Arsey thank for your answer. Interestingly , I found some preprocess_input implement by @karpathy on his neural-talk repository.

def preprocess_image(img):
    '''
    Preprocess an input image before processing by the caffe module.


    Preprocessing include:
    -----------------------
    1- Converting image to single precision data type
    2- Resizing the input image to cropped_dimensions used in extract_features() matlab script
    3- Reorder color Channel, RGB->BGR
    4- Convert color scale from 0-1 to 0-255 range (actually because image type is a float the 
        actual range could be negative or >255 during the cubic spline interpolation for image resize.
    5- Subtract the VGG dataset mean.
    6- Reorder the image to standard caffe input dimension order ( 3xHxW) 
    '''
    img      = img.astype(np.float32)
    img      = imresize(img,224,224) #cropping the image
    img      = img[:,:,[2,1,0]] #RGB-BGR
    img      = img*255

    mean = np.array([103.939, 116.779, 123.68]) #mean of the vgg 

    for i in range(0,3):
        img[:,:,i] = img[:,:,i] - mean[i] #subtracting the mean
    img = np.transpose(img, [2,0,1])
return img #HxWx3

As you can see, He wrote that

4- Convert color scale from 0-1 to 0-255 range (actually because image type is a float the 
    actual range could be negative or >255 during the cubic spline interpolation for image resize.

I don't think, high value of input can effect learning curve. But I will try both of them and compare the results.

Answer 3 · 2016-10-15T03:45:14.000Z

@Arsey How accurary you get on validation set? I just got only about 80% .
Happy to have any comment on my repository.
https://github.com/anhlt/keras-102-flower-dataset/

Answer 4 · 2016-10-15T06:48:56.000Z

@anhlt I have about 81% accuracy after 250th epoch on fine tuning step. I see that you also have GTX 1070, just like me))). BTW which OS do you use?

Answer 5 · 2020-05-15T07:37:27.000Z

rescale is a value by which we will multiply the data before any other processing. Our original images consist in RGB coefficients in the 0-255, but such values would be too high for our models to process (given a typical learning rate), so we target values between 0 and 1 instead by scaling with a 1./255 factor.