The ocrodeg
package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OCR applications.
The following illustrates the kinds of degradations available from ocrodeg
.
%pylab inline
Populating the interactive namespace from numpy and matplotlib
rc("image", cmap="gray", interpolation="bicubic")
figsize(10, 10)
import scipy.ndimage as ndi
import ocrodeg
image = imread("testdata/W1P0.png")
imshow(image)
<matplotlib.image.AxesImage at 0x7f511c1b0588>
This is just for illustration; for large page rotations, you can just use ndimage
.
for i, angle in enumerate([0, 90, 180, 270]):
subplot(2, 2, i+1)
imshow(ndi.rotate(image, angle))
random_transform
generates random transformation parameters that work reasonably well for document image degradation. You can override the ranges used by each of these parameters by keyword arguments.
ocrodeg.random_transform()
{'angle': -0.012292452096776464,
'scale': 1.0082362389124033,
'aniso': 1.1871904039346834,
'translation': (-0.015090714955534455, -0.011666614466062153)}
Here are four samples generated by random transforms.
for i in range(4):
subplot(2, 2, i+1)
imshow(ocrodeg.transform_image(image, **ocrodeg.random_transform()))
You can use transform_image
directly with the different parameters to get a feel for the ranges and effects of these parameters.
for i, angle in enumerate([-2, -1, 0, 1]):
subplot(2, 2, i+1)
imshow(ocrodeg.transform_image(image, angle=angle*pi/180))
for i, angle in enumerate([-2, -1, 0, 1]):
subplot(2, 2, i+1)
imshow(ocrodeg.transform_image(image, angle=angle*pi/180)[1000:1500, 750:1250])
for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
subplot(2, 2, i+1)
imshow(ocrodeg.transform_image(image, aniso=aniso))
for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
subplot(2, 2, i+1)
imshow(ocrodeg.transform_image(image, aniso=aniso)[1000:1500, 750:1250])
for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
subplot(2, 2, i+1)
imshow(ocrodeg.transform_image(image, scale=scale))
for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
subplot(2, 2, i+1)
h, w = image.shape
imshow(ocrodeg.transform_image(image, scale=scale)[h//2-200:h//2+200, w//3-200:w//3+200])
Pages often also have a small degree of warping. This can be modeled by random distortions. Very small and noisy random distortions also model ink spread, while large 1D random distortions model paper curl.
for i, sigma in enumerate([1.0, 2.0, 5.0, 20.0]):
subplot(2, 2, i+1)
noise = ocrodeg.bounded_gaussian_noise(image.shape, sigma, 5.0)
distorted = ocrodeg.distort_with_noise(image, noise)
h, w = image.shape
imshow(distorted[h//2-200:h//2+200, w//3-200:w//3+200])
for i, mag in enumerate([5.0, 20.0, 100.0, 200.0]):
subplot(2, 2, i+1)
noise = ocrodeg.noise_distort1d(image.shape, magnitude=mag)
distorted = ocrodeg.distort_with_noise(image, noise)
h, w = image.shape
imshow(distorted[:1500])
There are a range of utilities for modeling imaging artifacts: blurring, noise, inkspread.
patch = image[1900:2156, 1000:1256]
imshow(patch)
<matplotlib.image.AxesImage at 0x7f5118190710>
for i, s in enumerate([0, 1, 2, 4]):
subplot(2, 2, i+1)
blurred = ndi.gaussian_filter(patch, s)
imshow(blurred)
for i, s in enumerate([0, 1, 2, 4]):
subplot(2, 2, i+1)
blurred = ndi.gaussian_filter(patch, s)
thresholded = 1.0*(blurred>0.5)
imshow(thresholded)
for i, s in enumerate([0.0, 1.0, 2.0, 4.0]):
subplot(2, 2, i+1)
blurred = ocrodeg.binary_blur(patch, s)
imshow(blurred)
for i, s in enumerate([0.0, 0.1, 0.2, 0.3]):
subplot(2, 2, i+1)
blurred = ocrodeg.binary_blur(patch, 2.0, noise=s)
imshow(blurred)
for i in range(4):
noisy = ocrodeg.make_multiscale_noise_uniform((512, 512))
subplot(2, 2, i+1); imshow(noisy, vmin=0, vmax=1)
for i, s in enumerate([2, 5, 10, 20]):
subplot(2, 2, i+1)
imshow(ocrodeg.random_blobs(patch.shape, 3e-4, s))
blotched = ocrodeg.random_blotches(patch, 3e-4, 1e-4)
#blotched = minimum(maximum(patch, ocrodeg.random_blobs(patch.shape, 30, 10)), 1-ocrodeg.random_blobs(patch.shape, 15, 8))
subplot(121); imshow(patch); subplot(122); imshow(blotched)
<matplotlib.image.AxesImage at 0x7f51187a3438>
imshow(ocrodeg.make_fibrous_image((256, 256), 700, 300, 0.01))
<matplotlib.image.AxesImage at 0x7f51185f4160>
subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_multiscale(patch))
<matplotlib.image.AxesImage at 0x7f511858f0f0>
subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_fibrous(patch))
<matplotlib.image.AxesImage at 0x7f51185bd8d0>