clovaai/synthtiger

How to get character bbox annotation?

GuokunWang opened this issue · 5 comments

this project is very helpful for generating synth text for scene text recognition, and it seem to generate text image by combine several character images, but the outputs doesn't contain information of each character, is it possible to get character annotation, for example, each character and its location?

Hi,
It generates images by rendering several characters, but can't get character bboxes because transformation is applied after merging characters.
It needs to change rendering process to get character bboxes.
I'll try to improve rendering process to get character bboxes (and text mask).
Thanks.

@moonbings
Hi,
Do you have any plan to make code to get text box? Or is this project over?
Thanks

I ask if I can get the word or sentence bbox

Hi,
Sorry for the late reply.
Unfortunately, I can't maintain this project because of personal reasons. 😢

You can modify https://github.com/clovaai/synthtiger/blob/master/examples/synthtiger/template.py#L177-L190 this code to get character/word bbox.
For character bboxes, you can get it by making temporary copied character layers. Apply same transformation to temporary character layers and then return these bboxes.
For word bbox, you can get it by merging character bboxes.
Note that, this bbox is an world coordinates, so you need to change coordinates.

Here's an example.

def _generate_fg(self, color, style):
    ...

    char_layers = [layers.TextLayer(char, **font) for char in chars]
    self.shape.apply(char_layers)
    self.layout.apply(char_layers, {"meta": {"vertical": self.vertical}})

    layer = layers.Group(char_layers).merge()
    self.color.apply([layer], color)
    self.texture.apply([layer])

    self.style.apply([layer], style)
    self.style.apply(char_layers, style) # added

    transform = self.transform.sample() # added
    self.transform.apply([layer], transform) # changed
    self.transform.apply(char_layers, transform) # added

    self.fit.apply([layer])
    self.fit.apply(char_layers) # changed

    self.pad.apply([layer])
    out = layer.output()

    # change coordinates
    for char_layer in char_layers:
        char_layer.topleft -= layer.topleft

    # get bboxes
    char_bboxes = [char_layer.bbox for char_layer in char_layers] # [[left, top, width, height], ...]
    word_bbox = utils.merge_bbox(char_bboxes) # [left, top, width, height]

    return out, label, char_bboxes, word_bbox

And then, you need to modify this part to save bboxes.
https://github.com/clovaai/synthtiger/blob/master/examples/synthtiger/template.py#L132-L153

After changing the template, you can generate data with following command.

python -m synthtiger -o results -w 4 -v examples/synthtiger/template.py SynthTiger examples/synthtiger/config_horizontal.yaml

Thanks.

Now, we can get character bboxes and text mask.
Character bbox data is in coord.txt file and mask data is in masks directory.
The format of coord.txt is <image_path>\t<bbox>\t<bbox>\t<bbox>.... (<bbox>=<xmin>,<ymin>,<xmax>,<ymax>)
Check out the latest code.

Thanks.