FLming/CRNN.tf2

Empty preds when trained on my own data

wiistriker opened this issue · 5 comments

Hello! I am trying to train on my own data based on exported_model.h5. After 100 epochs i get empty preds:

Снимок экрана 2020-11-28 в 00 44 55

My config:

train:
    dataset_builder: &ds_builder
        table_path: 'data/table.txt'
        # 1: Grayscale image, 3: RGB image
        img_channels: 3
        # The image that width greater than max img_width will be dropped.
        # Only work with image width is null.
        max_img_width: 400
        ignore_case: false
        # If it is not null, the image will be distorted.
        img_width: null
        # If change height, change the net.
        img_height: 32
    train_ann_paths:
        - 'data/dataset/annotation_train.txt'
        - 'data/dataset/annotation_val.txt'
    val_ann_paths:
        - 'data/dataset/annotation_test.txt'
    batch_size_per_replica: 256
    # The model for restore, even if the number of characters is different
    restore: 'exported_model.h5'
    learning_rate: 0.001
    # Number of epochs to train.
    epochs: 100
    # Reduce learning rate when a metric has stopped improving.
    reduce_lr:
        factor: 0.5
        patience: 5
        min_lr: 0.0001
    # Tensorboard
    tensorboard:
        histogram_freq: 1
        profile_batch: 0

eval:
    dataset_builder:
        <<: *ds_builder
    ann_paths:
        - '/datasets/ICDAR/2013/Challenge2_Test_Task3_gt.txt'
    batch_size: 1

Tensorboard:
Снимок экрана 2020-11-28 в 00 40 27

If i run demo with exported_model.h5 i get preds. What i am doing wrong?

It seems that your model has not converged yet, the loss is high, and the accuracy is low. preds is empty is because model predict blank in the whole sequence

Thanks for your answer. So i need to continue training with more epochs? Is it okay that model cant predict any symbols when training is based on pre-trained exported_model.h5 ?

@FLming i extend dataset, train more epochs and get some results. Thanks!

But i faced with some issues: example images from you repo was recognized incorrectly now. My dataset contains images with serial numbers on it, so it's just random sequences of a-z and 0-9. It looks like i need to turn off LSTM layers?

To do so i need to comment lines 48-52 in models.py ?

x = layers.Bidirectional(
        layers.LSTM(units=256, return_sequences=True), name='bi_lstm1')(x)
x = layers.Bidirectional(
        layers.LSTM(units=256, return_sequences=True), name='bi_lstm2')(x)
x = layers.Dense(units=num_classes, name='fc1')(x)

Is it correct?

Another one question about confidence. If i run prediction on image with symbols on another language, nn gives me some results. This results will be incorrect ofcourse because nn doesnt see such symbols before. I want to filter such predictions. Is there any chance to get confidence for word of each char of word?

About the first question, according to the conclusion of the paper, removing the LSTM layer will reduce the capacity of the model, usually only reducing the performance.

The second question, it's about the decode method, the greedy method will give the predicted probability (softmax) of each character, you can do the postprocessing to get the probability of the sequence. Other methods can also give the predicted
sequence probability accordingly.

Thanks!