Endcode-Decode problem

Question

Endcode-Decode problem

trungpham2606 opened this issue 5 years ago · 12 comments

It's such a great implementation. But when I tried to visualize the ground truth images after training 1 epoch in Seglink ( i saw that some ground truth bounding boxes didnt fit the text although i have tested them all before training).

Visualize method: with normalized coordinates of bounding box, I first encoded them, then decoded (used sl_utils). After that, i drew the images.

But that problem doesnt appear in all images but some. So I wonder maybe there were some bug in either encode or decode part. Can you spend some time having a look on it ?

If you need to know anything in detail, just let me know.

Answer 1 · 2019-05-07T17:09:04.000Z

Which data set you use? Can you provide one of these samples as well as a piece of code?

Answer 2 · 2019-05-08T01:05:01.000Z

Hello Markus, I think it would be more convenient if I send you my script that Iam working with. [image: image.png] Vào Th 4, 8 thg 5, 2019 vào lúc 00:09 Markus Völk < notifications@github.com> đã viết:

…

Which data set you use? Can you provide one of these samples as well as a piece of code? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIQBNUR2QO2Y2Z2BWSLRK7LPUGZTBANCNFSM4HLEXVUQ> .

-- Ha Trung Pham, Mechatronics, Ha Noi University of Science and Technology, Viet Nam import numpy as np import matplotlib.pyplot as plt import keras import os import time import pickle from sl_model import SL512, DSODSL512 from ssd_data_2 import InputGenerator from sl_utils_2 import PriorUtil from sl_training import SegLinkLoss, SegLinkFocalLoss from utils.bboxes import polygon_to_rbox, rbox_to_polygon from utils.training import Logger, LearningRateDecay from utils.model import load_weights, calc_memory_usage from keras.callbacks import Callback import cv2 import random from ssd_data_2 import order_points, standalize from ssd_data_2 import preprocess np.set_printoptions(threshold= np.nan) #model = SL512() model = DSODSL512() model.summary() weights_path = 'E:\\OCR_project\\checkpoints\\weights.034.h5' #weights_path = 'E:\\OCR_project\\201806021007_dsodsl512_synthtext\\weights.012.h5' batch_size = 1 experiment = 'sl512_synthtext' if weights_path is not None: if weights_path.find('ssd512') > -1: layer_list = [ 'conv1_1', 'conv1_2', 'conv2_1', 'conv2_2', 'conv3_1', 'conv3_2', 'conv3_3', 'conv4_1', 'conv4_2', 'conv4_3', 'conv5_1', 'conv5_2', 'conv5_3', 'fc6', 'fc7', 'conv6_1', 'conv6_2', 'conv7_1', 'conv7_2', 'conv8_1', 'conv8_2', 'conv9_1', 'conv9_2', ] freeze = [ 'conv1_1', 'conv1_2', 'conv2_1', 'conv2_2', 'conv3_1', 'conv3_2', 'conv3_3', # 'conv4_1', 'conv4_2', 'conv4_3', # 'conv5_1', 'conv5_2', 'conv5_3', ] load_weights(model, weights_path, layer_list) for layer in model.layers: layer.trainable = not layer.name in freeze else: load_weights(model, weights_path) prior_util = PriorUtil(model) epochs = 30 initial_epoch = 0 data_dir = 'F:\\dataset2\\hihi\\' gen_train = InputGenerator(data_dir, prior_util, batch_size, input_size= model.image_size, training= True) gen_val = InputGenerator(data_dir, prior_util, batch_size, input_size= model.image_size, training= False) checkdir = './checkpoints/' + time.strftime('%Y%m%d%H%M') + '_' + experiment if not os.path.exists(checkdir): os.makedirs(checkdir) #optim = keras.optimizers.SGD(lr=1e-3, momentum=0.9, decay=0, nesterov=True) optim = keras.optimizers.Adam(lr=1e-3, beta_1=0.9, beta_2=0.999, epsilon=0.001, decay=0.0) # weight decay regularizer = keras.regularizers.l2(5e-4) # None if disabled #regularizer = None for l in model.layers: if l.__class__.__name__.startswith('Conv'): l.kernel_regularizer = regularizer loss = SegLinkLoss(lambda_offsets=1.0, lambda_links=1.0, neg_pos_ratio=3.0) #loss = SegLinkFocalLoss() #loss = SegLinkFocalLoss(lambda_segments=1.0, lambda_offsets=1.0, lambda_links=1.0) #loss = SegLinkFocalLoss(gamma_segments=3, gamma_links=3) class show_result(Callback): def __init__(self, model, groundtruth_valid, prior_util, input_shape, segment_threshold=0.55, link_threshold=0.45): self.model = model self.groundtruth_valid = groundtruth_valid self.prior_util = prior_util self.input_shape = input_shape self.segment_threshold = segment_threshold self.link_threshold = link_threshold def on_epoch_end(self, epoch, log={}): if np.mod(epoch + 1, 1) == 0: i = random.randint(0,self.groundtruth_valid.num_samples-1) img_name = list(self.groundtruth_valid.images_path.keys())[i] img_path = self.groundtruth_valid.images_path.get(img_name) print(img_name) #img_path = os.path.join(self.groundtruth_valid.image_path, str(img_name)) img = cv2.imread(img_path) img_detect = img.copy() img_groundtruth = img.copy() lbl_path_name = list(self.groundtruth_valid.annotation_path.keys())[i] lbl_path = self.groundtruth_valid.annotation_path.get(lbl_path_name) pts = [] with open(lbl_path, 'r') as f: lines = f.readlines() for line in lines: line = line.split(',') points = [(float(line[0]), float(line[1])), (float(line[2]), float(line[3])), (float(line[4]), float(line[5])), (float(line[6]), float(line[7]))] points_shit = order_points(points.copy()) ordered_points = np.array(points_shit, dtype='float32') standalize_points = standalize(img_groundtruth, ordered_points) pts.append(order_points(standalize_points)) lbl = np.copy(pts) target = self.prior_util.encode(lbl) target = np.array(target, dtype=np.float32) #print('target', target[:, 1]) h, w, c = np.array(img).shape input_size = self.input_shape[:2] result = self.prior_util.decode(target, self.segment_threshold, self.link_threshold) print('Groundtruth : ',np.array(result).shape) for r in result: xy = rbox_to_polygon(r[:5]) xy = np.copy(xy) / input_size * [w, h] xy = xy.reshape((-1, 1, 2)) xy = np.round(xy) xy = xy.astype(np.int32) cv2.polylines(img_groundtruth, [xy], True, (0, 255, 0)) # predict show x = np.array([preprocess(img_detect, input_size)]) y = self.model.predict(x) result = self.prior_util.decode(y[0], self.segment_threshold, self.link_threshold) print('Detect : ',np.array(result).shape) for r in result: xy = rbox_to_polygon(r[:5]) xy = np.copy(xy) / input_size * [w, h] xy = xy.reshape((-1, 1, 2)) xy = np.round(xy) xy = xy.astype(np.int32) cv2.polylines(img_detect, [xy], True, (0, 0, 255)) concat = np.hstack((img_groundtruth, img_detect)) cv2.imshow("Results", concat) cv2.waitKey(0) cv2.destroyAllWindows() else: print('Dont show valid results') show_valid_progress= show_result( model, gen_val, prior_util, input_shape= model.image_size, segment_threshold=0.55, link_threshold=0.45) model.compile(optimizer=optim, loss=loss.compute, metrics=loss.metrics) history = model.fit_generator( gen_train.generate(), steps_per_epoch=gen_train.num_batches, epochs=epochs, verbose=1, callbacks=[keras.callbacks.ModelCheckpoint(checkdir+'/weights.{epoch:03d}.h5', verbose=1, save_weights_only=True), show_valid_progress], validation_data=gen_val.generate(), validation_steps=gen_val.num_batches, class_weight=None, max_queue_size=1, workers=1, #use_multiprocessing=False, initial_epoch=initial_epoch #pickle_safe=False, # will use threading instead of multiprocessing, which is lighter on memory use but slower )

Answer 3 · 2019-05-08T07:15:22.000Z

trungpham2606 commented 5 years ago

Answer 4 · 2019-05-09T20:40:53.000Z

Okay, if you do something like

plt.imshow(images[i])
egt = prior_util.encode(data[i])
prior_util.plot_gt()
prior_util.plot_results(prior_util.decode(egt), color='r')

you may observe the following behavior

I confirm, this is a major issue with the segment width in the implementation and in the SegLink approach in general. When I wrote the code, there was no reference implementation available and I was not quite sure how to handle the segment width properly.

Let's look at Figure 5 (3) in the SegLink paper. There are exactly two cases that can occur on the left side of the shown segment. In the first case, another prior box is assigned to the word bounding box and the ground truth width of the corresponding segment is defined by means of the intersection between the prior and the word bounding box. This is also done in the implementation of the SegLink authors. The second case is when no further prior box can be assigned to the word bounding box and the decoded bounding box shrinks. Hence, the ground truth width of a segment is always less or equal to the width of the prior box.

In my implementation, I found that only the second case is a problem when the cropped bounding box is passed to the recognition stage. For that reason, I decided to added some padding to the resulting bounding box.

A pragmatic fix could it be, to allow the left and right most segment to have a width larger than the width of the prior box and then consider only the width of these segments in the loss function.

In case of your dataset, I'm assuming that the aspect ratio is not too large and the text is aligned almost horizontally. You probably may get better results with TextBoxes++ or even with TextBoxes.

Answer 5 · 2019-05-16T21:54:47.000Z

@trungpham2606 I spent some time and took a closer look on the issue. It turned out, that there is indeed a issue with the decoding as described in the SegLink paper. In Algorithm 1, step 6 makes only sense if x_p and x_q are on the left and right edge of the bounding box and step 8 makes only sense if x_p and x_q are on the centers of the rightmost and leftmost segment.

I have changed the decoding method to fix this issue and updated my previous comment to avoid confusion. The encoding works as described in the paper, but the issue I mentioned still remains.

The example from above now looks like this:

The modified decoding slightly increased the f-measure of the SegLink model from 0.868 to 0.869.

Thank You!

Answer 6 · 2019-05-17T01:47:33.000Z

@mvoelk thank you so much for your support. I will apply your change to my dataset and show you my results then.

Answer 7 · 2019-05-17T02:42:01.000Z

@mvoelk I just tested your new decode script. The results look better than before, there're still some images that ground truth bounding boxes didnt fit the text though. But the results are way better, at least for me case.
Thank you so much for your help. If you figure out anything else to improve or fix decode part completely, just lets me know.

Answer 8 · 2019-05-20T19:01:27.000Z

F-measure of SegLink with DenseNet and Focal Loss increased from 0.922 to 0.932.

Answer 9 · 2019-05-21T01:08:54.000Z

Oh nice, can you provide the parameters you chose for training with Focal Loss :-? I used to set them as your default but the loss was way worse than normal Loss.
Thanks in advance!

Answer 10 · 2019-05-21T10:48:51.000Z

@trungpham2606 I'm not sure if the default values in sl_training.py are correct. Can you try lambda_segments=1.0, lambda_offsets=1.0, lambda_links=1.0 and report whether the scale is roughly the same as in the log file I provieded with the model? Which f-measuer do you get on segments?

Answer 11 · 2019-05-22T03:01:00.000Z

@mvoelk Actually I tried on my dataset ( the images which I showed you above ). I just normally observed that the Focal lost initialized at 10000 or even more than that, then slowing down but not much.
I will tried with your suggestion and show you the results as soon as possible.

Answer 12 · 2019-05-22T17:58:24.000Z

I usually divided the loss terms by the number of instances. In SegLinkFocalLoss I commented this normalization out. You should get the old behavior if you uncomment the necessary lines.