nlmatics/llmsherpa

Feature Request - Splitting Bounding Boxes Across Pages

amn-max opened this issue · 1 comments

I am not sure if this is being handled. During my test I found that box coordinates does not take in to account of intersection.

I am writing to propose a feature enhancement for your project, specifically regarding the current handling of bounding boxes (bbox) in the context of PDF generation.

  1. Currently, when generating a PDF, a single bbox is produced for the chunk of text located in the intersection of the page. While this approach is effective, I would like to suggest an enhancement that involves splitting the bbox into different pages, providing more granularity in representing the layout of text across pages.

  2. Second feature is to add the PDF (mediabox, cropbox or rect) width and height on every page in the API response. This will provide a much better usabilty of the bbox to add annotation/highlight layer using the bbox

bbox interdection

Hi @amn-max. This is a grea idea and I see it helping with highlighting. This change needs to be done in the nlm-ingestor side though. The bbox field can be an array here: [{"page_idx": 1, "bbox": [x11, y11, x12, y12]}, {"page_idx"}: 2, "bbox": [x21, y21, x22, y22]}. This will take some work and testing though. If you are interested in working on it, I can give you some pointers.