fh2019ustc/DocGeoNet

Some questions about the textlines annotation process

Opened this issue · 5 comments

Hello hao,
Thanks for your awesome work for document image dewarping.
Could you provide more details about the textlines annotation process? (e.g., the kernel size of binarization and dilation, and the filter rule)

Hi, I am sorry for the late reply due to my health.
I use the cv2.adaptiveThreshold for binarization as follows,

cv2.adaptiveThreshold(xxx, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV,ADAPTIVE_WINSZ, 25)

Besides, for dilation, the kernel size is 1 * 10 (h * w).

Thanks for your reply.
Hope you will get well soon :)
I still have some questions about how you get the ADAPTIVE_WINSZ in cv2.adaptiveThreshold, and how to filter out non-textline connected regions?

  ADAPTIVE_WINSZ=35
  width and height are the shape of textline candidate 
  if (width < 30) or (height < 2) or (width < 1.5*height):
      this is not a textline

Hope this helps.

Thank you for sharing the experiment detail!

@fh2019ustc
I have a question about the localization step of the textlines annotation process.
When creating textline masks, did you fill in all the pixels inside the bounding boxes? Or did you shrink the heights of the bounding boxes so that the textline masks only pass through the middle of the bounding boxes? example