Request for LINE Aggregation Level

Question

Request for LINE Aggregation Level

Six-Persimmon opened this issue a year ago · 0 comments

Hi developers.
Thank you very much for this awesome work! I am writing to request if you could add a GCVFeatureType.LINE aggregation level for the OCR module? e.g. layout = ocr_agent.gather_full_text_annotation(res, agg_level=lp.GCVFeatureType.LINE)
The motivation comes from some random behaviors of layout parser when I was working on some historical materials that contains date information. Specifically, if I aggregate for blocks like "Nov 07, 1995" in a scanned PDF at the WORD level, I may randomly get one of the following:
"Nov 07, 1995" "199507,Nov" "199507Nov,"
which may cause trouble when conducting datetime related operations. My guess is that the aggregation cannot identify the right relation between the comma and other words.
Thank you!