instead of one text block multiple text blocks
naarkhoo opened this issue · 1 comments
naarkhoo commented
I am using layoutparser '0.3.4' through
! pip install layoutparser torchvision && pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2"
in colab
my model is
model = lp.models.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.5],
label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
What I see is that block detection is too sensitive - meaning instead returning one block of text, the result is instead four blocks of text. The input is an article from pubmed.
What is the best practice in such case ?
- labling additional data and fine tuning the model ?
- post analysis using the coordinates ? (too hacky)
- is there any other model that is less sensitive
naarkhoo commented
seems it returns both the big text_block along with every line as a block