
I can't train my private dataset

I'm trying to use captcha to text but I can't train my dataset like you.
When I tried with the dataset you gave, it worked without any problems, but when I changed my own images with yours, I had problems. A few examples from my dataset with 10129 images:


I made a change in file like this:
label = os.path.splitext(file)[0] -> label = os.path.splitext(file)[0].split('-')[1].

Because the names of my images are not captcha_answer.png like yours, but md5hash-captcha_answer.png. So I made a change in this way and made it take the captcha_answer parameter in the same way.

In the file, since all my images are 350x100, I changed self.height = 100 and self.width = 350. Then I got the following error. Can you help me solve this?


It would help if you modified the model according to your input shapes. I believe if you will set self.height = 50 and self.width = 200 as it was initially, everything will be ok. Check this out

I changed it but I keep getting the same error

Can you try to use this project again with my images?

I think you are inputing None to the model, check this first (if you really receive images from dataprovider)

I'm sure about this. Please watch the video:

If you can upload somewhere your dataset and code you showed here, I'll try it by myself

You can download 100 images of my dataset and the file I changed from this link.

ok mate, I tested it out and it seems that cv2.imread(file_path) returns None. So it means that your images are in some kind format so opencv can't handle it up. So either fix your images or find a way to read them using different method. But I prefer fixing them, try to read them in pillow and save them with cv2, this should solve this

Thanks! I solved the problem using this code.

import os
import cv2
import numpy as np
from PIL import Image

for img_path in os.listdir('Datasets/captcha_images_v2'):
    old_path = 'Datasets/captcha_images_v2/' + img_path
    img_pil =
    img_np = np.array(img_pil)
    img_cv2 = cv2.cvtColor(img_np, cv2.COLOR_RGB2BGR)

    new_path = old_path.replace('captcha_images_v2', 'my_dataset')

    cv2.imwrite(new_path, img_cv2)

But before closing the issue, I want to ask you a few questions. How many epochs do you think I should use for this project? I noticed that it works when I set self.height=100, self.width=350 again. Do you think I should train this way or 50, 200? (all my images are in 100x350 format). Other than that, if you have any suggestions for this project, I'd love to hear it. Thanks again!

train for unlimited epochs, for example for 1000 epochs, and EarlyStopping at some point will work. Then train another model with a different input size until it stops. And choose whether the model gives you better accuracy in terms of CER, 1000 images are a pretty small dataset, you may increase accuracy by adding more images. Overall, images are pretty simple, so it should take for long to train both of these models

Should I leave the self.height and self.width values ​​at their default values ​​of 50, 200 or change them to 100, 350 the same as my images?

It saves onnx and csv files only after all epochs are finished. But he was recording every time on video. What is the reason of this?

Usually, smaller input size, means faster inference model, higher input size slower inference but better accuracy. So that's why you should train both models and see whether bigger input size does impact to the accuracy.

Can you explain this in more details? What you mean? it saves .h5 best model every epoch, and after training finishes it loads these weights and converts model into onnx

I seem to have solved the problem but now I have a different problem. I trained the model via Google Colab. I started it with 350x100, 300 epochs and 256 batch_size. It was stopped by Earlystopper at epoch 50. When I tried the model, even with the images it trained, I was getting an empty predict result. When I looked at the file you trained and my own log file, my loss values were almost never decreasing. However, in yours, it decreased and decreased even less than 1. I have shared my results below, can you check and give suggestions?

@pythonlessons Please check. By the way, 2nd tutorial gives an error when I try to use it with the version you updated.

thanks, I'll make fix release first
