Layout-Parser/layout-parser

instead of one text block multiple text blocks

naarkhoo opened this issue · 1 comments

I am using layoutparser '0.3.4' through

! pip install layoutparser torchvision && pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2" in colab

my model is

model = lp.models.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
                                 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.5],
                                 label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})

What I see is that block detection is too sensitive - meaning instead returning one block of text, the result is instead four blocks of text. The input is an article from pubmed.

What is the best practice in such case ?

  1. labling additional data and fine tuning the model ?
  2. post analysis using the coordinates ? (too hacky)
  3. is there any other model that is less sensitive

image

seems it returns both the big text_block along with every line as a block