mvoelk/ssd_detectors

How to train CRNN using my own dataset?

Opened this issue · 9 comments

I want to train the model on my own data for a specific use case. I have my dataset in ICDAR-FST2015 dataset format, but the thing is, the InputGenerator in crnn_data which is used for training the CRNN model seems to enter an infinite loop and the training part (model.fit_generator()) doesn't show any progress even after hours. Is this normal behavior? Should I change my dataset into another format (like PASCAL-VOC) and try the same thing again?
I have been stuck at this for days now and any suggestions/help will highly be appreciated.
Thanks.
The steps for the training followed:

from data_icdar2015fst import GTUtility
gt_util_train = GTUtility('path')
gt_util_val = GTUtility('path', test=True)
from crnn_utils import alphabet87 as alphabet
input_width = 600
input_height = 800
batch_size = 8
input_shape = (input_width, input_height, 1)

# model, model_pred = CRNN(input_shape, len(alphabet), gru=False)
# experiment = 'crnn_lstm_synthtext'

model, model_pred = CRNN(input_shape, len(alphabet), gru=True)
experiment = 'crnn_gru_synthtext'

max_string_len = model_pred.output_shape[1]

gen_train = InputGenerator(gt_util_train, batch_size, alphabet, input_shape[:2], 
                           grayscale=True, max_string_len=max_string_len)
gen_val = InputGenerator(gt_util_val, batch_size, alphabet, input_shape[:2], 
                         grayscale=True, max_string_len=max_string_len)
checkdir = './checkpoints/' + time.strftime('%Y%m%d%H%M') + '_' + experiment
if not os.path.exists(checkdir):
    os.makedirs(checkdir)

with open(checkdir+'/source.py','wb') as f:
    source = ''.join(['# In[%i]\n%s\n\n' % (i, In[i]) for i in range(len(In))])
    f.write(source.encode())

optimizer = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
#optimizer = Adam(lr=0.02, epsilon=0.001, clipnorm=1.)

# dummy loss, loss is computed in lambda layer
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=optimizer)

#model.summary()

model.fit_generator(generator=gen_train.generate(), # batch_size here?
                    steps_per_epoch=gt_util_train.num_objects // batch_size,
                    epochs=10,
                    validation_data=gen_val.generate(), # batch_size here?
                    validation_steps=gt_util_val.num_objects // batch_size,
                    verbose=2,
                    callbacks=[
                        ModelCheckpoint(checkdir+'/weights.{epoch:03d}.h5', verbose=1, save_weights_only=True),
                        ModelSnapshot(checkdir, 10000),
                        Logger(checkdir)
                    ],
                    initial_epoch=0)

With 'infinite loop', do you mean the while True: statement?
Why input_shape = (600, 800, 1)? The architecture is designed for (256, 32, 1). You can change the width, but choosing it too small makes it difficult to learn sequences.

Yeah. What is the exit condition for the while True: loop?
About the input size, my images are (600x800). Resizing them any further may lead the text to be indiscernible. Do I need to re-annotate with the cropped images of size (256,32)? Is there any other way?

Yeah. What is the exit condition for the while True: loop?

yield returns a batch every time next() is called on the generator, search for python generators...

input_shape = (256, 32, 1) is the input of the CRNN model, corresponding to the size of the cropped word bounding boxes, not the size of the raw image or the size of the detector stage. For the details, see the CRNN paper...

Okay thanks. I am not really sure how yield works. Will look into those. Can I get an email correspondence for further queries? Also, how can I get some progress bar while the training is going on? I tried passing verbose=2 while training, but it isn't really helping. Some visualization of the progress will be helpful.

Also, why are you dropping these in crnn_data.py file:

# drop words with width > height here
mask = np.array([w.shape[1] > w.shape[0] for w in words])
words = np.asarray(words)[mask]
texts = texts[mask]

Wouldn't the cropped words will have their width > height in horizontally written languages like English?

With CRNN_log.ipynb, you can plot the loss during training. That's all I've implemented...

# drop words with width > height here
Typo in the comment, should be width < height, but the words are padded anyway.

E-mail, I prefer the issues because I do not always want to answer the same questions again.

@mvoelk Thanks for the replies. This will probably be the last query. Thank you for your patience.
The CRNN model is getting trained.
Now, to build the end-to-end model for my dataset, I will be required to train the SegLink model on my dataset for the bounding boxes, right? Once the SL Model predicts the bounding boxes, the recognition part can be then carried out using the CRNN trained model. Am I right?

Also, while trying to run SL_train.ipynb,

-  ssd_detectors\sl_training.py in compute(self, y_true, y_pred)
-    172             from tfkeras.utils.training import reduced_focal_loss as focal_loss
-    173         else:
- --> 174             from tfkeras.utils.training import focal_loss
-    175 
-    176         batch_size = tf.shape(y_true)[0]

- ModuleNotFoundError: No module named 'tfkeras.utils.training'

I tried replacing tfkeras with tf.keras and 'tensorflow.keras but still, module not file error occurred. What am I doing wrong?

Found the problem. Imported the module from utils.training

Fixed, thanks :)