Lexie88rus/quick-draw-image-recognition

CropImage IndexError: tuple index out of range

Closed this issue · 9 comments

So I was trying this variant of quickdraw data recognition since it was quite different than all other implementations, but after training the model and trying to run predictions, I ran across this error during the crop image part.

PS: I was/am trying to run prediction on a image file (saved file)

The error

loading eye drawings
load complete
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-8-ea2a130d0e53> in <module>
     13 
     14 # preprocess the image for the model
---> 15 image_cropped = crop_image(img) # crop the image and resize to 28x28
     16 image_normalized = normalize_image(image_cropped) # normalize color after crop
     17 

<ipython-input-2-0dbe01aed1ef> in crop_image(image)
    341         for j in range(0, height):
    342             # save coordinates of the image
--> 343             if (pixels[i,j][3] > 0):
    344                 image_strokes_cols.append(i)
    345                 image_strokes_rows.append(j)

IndexError: tuple index out of range

The code I am using

from quickdraw import QuickDrawData

qd = QuickDrawData(recognized=True)
eye = qd.get_drawing("eye")

# open Image with PIL
#img = Image.open(image_data)
img = eye.image

# save original image as png (for debugging)
ts = time.time()
img.save('image' + str(ts) + '.png', 'PNG')

# preprocess the image for the model
image_cropped = crop_image(img) # crop the image and resize to 28x28
image_normalized = normalize_image(image_cropped) # normalize color after crop

# convert image from RGBA to RGB
img_rgb = convert_to_rgb(image_normalized)

# convert image to numpy
image_np = convert_to_np(img_rgb)

# apply model and print prediction
label, label_num, preds = get_prediction(model, image_np)
print("This is a {}".format(label_num))

The input image is also attached

image1560184117 095532

Hi! thank you very much for your interest in my code :)
This error is raised because your image is RGB and not RGBA.
Fast and dirty solution would be to add:

img = img.convert("RGBA")

So the whole script looks like this and works fine with your image:

# import PIL for image manipulation
from PIL import Image
from PIL import ImageOps

import time

# import image processing
import sys
sys.path.insert(0, '../')
import image_utils
from image_utils import crop_image, normalize_image, convert_to_rgb, convert_to_np

# open Image with PIL
img = Image.open('test_image.png')
img = img.convert("RGBA")

# save original image as png (for debugging)
ts = time.time()
img.save('image' + str(ts) + '.png', 'PNG')

# preprocess the image for the model
image_cropped = crop_image(img) # crop the image and resize to 28x28
image_normalized = normalize_image(image_cropped) # normalize color after crop

# convert image from RGBA to RGB
img_rgb = convert_to_rgb(image_normalized)

# convert image to numpy
image_np = convert_to_np(img_rgb)

# apply model and print prediction
#label, label_num, preds = get_prediction(model, image_np)
#print("This is a {}".format(label_num))

Thank you, the code is running now but the prediction is totally off. Most of the times, its guessing "eye" as nail.

Attached image and my notebook on how I am doing it, would really appreciate if you can take a look. I found a few implementations of running predictions on actual images rather than strokes, but this is the first one which addresses using actual image files (most others just leave it at test/eval and do not actually do real life predictions).

If it works, I would really be interested to train it on full data set of 345 categories. Can you point me in right direction with the parameters to use/try for full set ?
Even in this one, I had to disable random data parameter (rotated/flipped) since it was taking too long, even on a fairly high end google cloud instance.

Also, if you are interested in doing it yourself, I would be able to provide a Google Cloud instance of whatever specs you might need, have quite a bit of GCM credits which I would be more than happy to spend to see a actual implementation of this running on full data set.
I have been banging my head on this for 12 days now.

image

Uploaded notebook in zip as github not allowing to upload .ipynb

Lexie88rus-quick-draw-image-recognition.zip

Also, without adding rotated/flipped versions of images, by default does this use 3000 images of each type ?

I have started training a new model with 10000 images per type to see if it makes any improvement in my predictions

for key, value in classes_dict.items():
            lst.append(value[:3000])
        doodles = np.concatenate(lst)

In this project, I was exploring different possibilities on my local machine without GPU so yes I am only using 10 classes and 3000 images per class and that is why the accuracy of the model is not impressive.
To improve I would suggest:

  • Use the full dataset, not only 3000 images per class.
  • Use PyTorch standard transformations to rotate/clip randomly. They will be way more efficient than my implementation.
  • Try out different CONV net architectures: DenseNets, VGG etc. Some models are already implemented in PyTorch (torchvision) and some implementations can be brought from this repository for example.

We can collaborate to implement this and then try to run it on Google Cloud instance.

And of course, to get model with better accuracy, it would be better to use full 256 x 256 images and not the simplified 28 x 28 files.
I already wrote some code on Kaggle, which converts strokes into images (but this won't be very efficient).

And of course, to get model with better accuracy, it would be better to use full 256 x 256 images and not the simplified 28 x 28 files.
I already wrote some code on Kaggle, which converts strokes into images (but this won't be very efficient).

Regarding using full images, I know a better way than converting strokes to images. There is this python API/framework called "QuickData" which already has almost all of the images. Thats how I am getting and using image in my code too using ".image" method of it.

There was this implementation which uses full images - https://github.com/zihenglin/quick-draw-recognition
It was the first one I tried but when using full data set, the canvases it creates was nuts (2500x2500 px images and 100k of them). Even then, the accuracy and predictions was nowhere near whats shown in demo.

But I think thats the kind of way you mean ? If so, I had written a wrapper to download 4,000 images per class (345 x 4000 total data) in parallel.

Again, I have the server/GPU resources but I am still a learning in ML, so I would be happy to collab and assist as much as my knowledge and abilities allow me to.

I think the best way would be to merge the code which generates canvases out of strokes with PyTorch dataloaders and transformations.

I think I can come up with an initial version of a script tested on my local machine and then we will try to test it and optimize on GPU server.

I think the best way would be to merge the code which generates canvases out of strokes with PyTorch dataloaders and transformations.

I think I can come up with an initial version of a script tested on my local machine and then we will try to test it and optimize on GPU server.

Sure, in meantime I will try training your code on whole data set.
Let me know whenever you need GPU instance access or anything GCM related, would love to see one quickdraw implementation work well on whole data set.

ok, great!