Finetuning for layout detection

Question

Finetuning for layout detection

Opened this issue 3 months ago · 3 comments

sky-2002 commented 3 months ago

Hi @VikParuchuri ,
Great project, I have been using it and it works for almost every use case of mine.
However I am now having some very complex documents and I want to finetune the layout detection models for my data.
Would be great if you could provide some directions on the following:

A data annotation tool which would output in the format needed by surya-ocr?
Are there any finetuning instructions available in docs or guide?

Again, a very nice open source project 🙌🏻

Answer 1 · 2024-08-14T09:50:18.000Z

Hi @sky-2002,

How do you do to get layout detection likes this. I ran detect_layout.py. I got only text boundingbox for each row, not a block of paragraph,..etc.

I am happy if you can give me some code example.

Answer 2 · 2024-08-14T10:38:38.000Z

@phamkhactu sure

# import required functions from surya
def layout(image):
    line_predictions = batch_text_detection([image], det_model, det_processor)
    layout_predictions = batch_layout_detection([image], layout_model, layout_processor, line_predictions)
    return layout_predictions[0].bboxes

If you are aware of finetuning for layout detection models, please point me to resources (for data annotation and model training both)

Answer 3 · 2024-08-15T01:49:51.000Z

Hi @sky-2002

I used config:

parser.add_argument("--images", action="store_true", help="Save images of detected layout bboxes.", default=True)

to save layout detected by model. I saw that, it drew bounding box for reach row, not a block.

Could you give me your image that you detected layout successfully? I want to check it, maybe I am wrong in runining code.