Page dewarp cuts text

Question

Page dewarp cuts text

Opened this issue 6 years ago · 8 comments

I have been using this using the command line python page_dewarp.py image.jpg with the included images successfully but when using an image of my own I could see that the text is cropped and cut.

The image was a 2MB jpeg with the resolution of 3400x4600px

This is the original and the resulted image:

Do you know what might be the problem? Thanks

Answer 1 · 2019-05-28T03:24:55.000Z

Try setting PAGE_MARGIN_X and PAGE_MARGIN_Y to zeros.

Answer 2 · 2020-03-12T16:11:27.000Z

I personally would prefer other defaults:

no cropping
no binarization (black/white)
no subsampling (full resolution)
= dewarping only

Answer 3 · 2022-10-11T01:51:58.000Z

I personally would prefer other defaults:

no cropping

no binarization (black/white)

no subsampling (full resolution)

= dewarping only

how to set in code？

Answer 4 · 2022-10-11T07:35:04.000Z

diff --git a/page_dewarp.py b/page_dewarp.py
index 6ef5b33..d095244 100755
--- a/page_dewarp.py
+++ b/page_dewarp.py
@@ -20,8 +20,8 @@ import scipy.optimize
 # for some reason pylint complains about cv2 members being undefined :(
 # pylint: disable=E1101
 
-PAGE_MARGIN_X = 50       # reduced px to ignore near L/R edge
-PAGE_MARGIN_Y = 20       # reduced px to ignore near T/B edge
+PAGE_MARGIN_X = 0       # reduced px to ignore near L/R edge
+PAGE_MARGIN_Y = 0       # reduced px to ignore near T/B edge
 
 OUTPUT_ZOOM = 1.0        # how much to zoom output relative to *original* image
 OUTPUT_DPI = 300         # just affects stated DPI of PNG, not appearance
@@ -813,17 +813,13 @@ def remap_image(name, img, small, page_dims, params):
     image_y_coords = cv2.resize(image_y_coords, (width, height),
                                 interpolation=cv2.INTER_CUBIC)
 
-    img_gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
-
-    remapped = cv2.remap(img_gray, image_x_coords, image_y_coords,
+    remapped = cv2.remap(img, image_x_coords, image_y_coords,
                          cv2.INTER_CUBIC,
                          None, cv2.BORDER_REPLICATE)
 
-    thresh = cv2.adaptiveThreshold(remapped, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
-                                   cv2.THRESH_BINARY, ADAPTIVE_WINSZ, 25)
+    thresh = remapped
 
     pil_image = Image.fromarray(thresh)
-    pil_image = pil_image.convert('1')
 
     threshfile = name + '_thresh.png'
     pil_image.save(threshfile, dpi=(OUTPUT_DPI, OUTPUT_DPI))

Answer 5 · 2022-10-11T07:41:46.000Z

diff --git a/page_dewarp.py b/page_dewarp.py
index 6ef5b33..d095244 100755
--- a/page_dewarp.py
+++ b/page_dewarp.py
@@ -20,8 +20,8 @@ import scipy.optimize
 # for some reason pylint complains about cv2 members being undefined :(
 # pylint: disable=E1101
 
-PAGE_MARGIN_X = 50       # reduced px to ignore near L/R edge
-PAGE_MARGIN_Y = 20       # reduced px to ignore near T/B edge
+PAGE_MARGIN_X = 0       # reduced px to ignore near L/R edge
+PAGE_MARGIN_Y = 0       # reduced px to ignore near T/B edge
 
 OUTPUT_ZOOM = 1.0        # how much to zoom output relative to *original* image
 OUTPUT_DPI = 300         # just affects stated DPI of PNG, not appearance
@@ -813,17 +813,13 @@ def remap_image(name, img, small, page_dims, params):
     image_y_coords = cv2.resize(image_y_coords, (width, height),
                                 interpolation=cv2.INTER_CUBIC)
 
-    img_gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
-
-    remapped = cv2.remap(img_gray, image_x_coords, image_y_coords,
+    remapped = cv2.remap(img, image_x_coords, image_y_coords,
                          cv2.INTER_CUBIC,
                          None, cv2.BORDER_REPLICATE)
 
-    thresh = cv2.adaptiveThreshold(remapped, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
-                                   cv2.THRESH_BINARY, ADAPTIVE_WINSZ, 25)
+    thresh = remapped
 
     pil_image = Image.fromarray(thresh)
-    pil_image = pil_image.convert('1')
 
     threshfile = name + '_thresh.png'
     pil_image.save(threshfile, dpi=(OUTPUT_DPI, OUTPUT_DPI))

thank you very much

Answer 6 · 2022-10-11T07:55:16.000Z

PS: did a simulation of the warping of pages of an open book. Approximations of this pages (x, y) look like x^4, not like x³, but I don't know how this maps to "text line curves":

Answer 7 · 2022-11-23T08:25:24.000Z

hi @jbarth-ubhd @KyleWang-Hunter I have problem same cut text. it cut text at the end of image

I had set margin_x, margin_y to zeros. How to fix it?? Thanks in advance

Answer 8 · 2022-11-23T13:28:10.000Z

For such slightly skewed text without curvature from bent paper, I would use a much simpler algorithm, e. g. https://github.com/jbarth-ubhd/fix-perspective