tesseract-ocr/tesseract

terminate called after throwing an instance of 'std::bad_alloc'

Closed this issue · 33 comments

Hello,

First thanks for your job. I am trying to run tesseract 4 but I am getting an issue:

Info in bmfCreate: Generating pixa of bitmap fonts from string terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted (core dumped)

Step to reproduce (with a docker file):

FROM ubuntu
RUN apt-get update && apt-get install -y \
	autoconf \
	automake \
	libtool \
	autoconf-archive \
	pkg-config \
	libpng12-dev \
	libjpeg8-dev \
	libtiff5-dev \
	zlib1g-dev \ 
	libicu-dev \
	libpango1.0-dev \
	libcairo2-dev \
	git \
	curl && \
	rm -rf /var/lib/apt/lists/*

RUN curl http://www.leptonica.org/source/leptonica-1.74.1.tar.gz -o leptonica-1.74.1.tar.gz && \
	tar -zxvf leptonica-1.74.1.tar.gz && \
	cd leptonica-1.74.1 && ./configure && make && make install && \
	cd .. && rm -rf leptonica*

RUN git clone --depth 1 https://github.com/tesseract-ocr/tesseract.git && \
	cd tesseract && \
	./autogen.sh && \
	./configure --enable-debug && \
	LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make && \
	make install && \
	ldconfig && \
	make training && \
	make training-install && \
	cd .. && rm -rf tesseract

# Get basic traineddata
RUN curl https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata > eng.traineddata && \
	mv eng.traineddata /usr/local/share/tessdata/

RUN curl https://github.com/tesseract-ocr/tessdata/raw/master/fra.traineddata > fra.traineddata && \
	mv fra.traineddata /usr/local/share/tessdata/

Then:

docker build -t tesseract4 .
docker run tesseract4
docker run -t -i tesseract4 /bin/bash
mkdir test
cd test
curl http://tleyden-misc.s3.amazonaws.com/blog_images/ocr_test.png > test.png
tesseract test.png out

Can someone explain me what is happening?

For information I have 2471 megabytes of memory remaning

Thanks in advance

I did not built it with ubuntu.
I read in the referenced issue that we should not use it in docker image. Do you know why ?
I need to use it in such way

I do not know about docker images.

I thought @amitdo was referring to --enable-debug option of configure.

I will try to use it without enabled-debug option and give you the output

I try using with and without --enable-debug and nothing is working.

Still the same issue:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

My issue is not a build failure.

Build is going well. The issue is when I launch tesseract

EDIT: I made a try outside of a docker image by simply running the command manually and I have the same error with or without the --enable-debug

Error message Info in bmfCreate: Generating pixa of bitmap fonts

is similar to #873

That error/info message is from Leptonica

https://github.com/cotdp/leptonica/blob/master/src/bmf.c

Please check where leptonica is installed. Do you have multiple versions?

Concerning the multiple version I have only one installed.
I won't be able to see installation directory tonight because I deleted my instance aws. I will create a new one tomorrow. Can you tell me the normal installation directory so I can check tomorrow ?
However according the "make" documentation it should be in /usr/bin

@speedfl No need to rebuild. I have not used docker so was just guessing.

@xlight #817 (comment) maybe able to help.

Here is my configuration:

root@65369dfbb4d0:/# tesseract -v
tesseract 4.00.00alpha
 leptonica-1.74.1
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8

 Found AVX
 Found SSE

And here is where I found tesseract packages

root@65369dfbb4d0:/# find / -name "*tesseract*"
/usr/local/include/tesseract
/usr/local/bin/tesseract
/usr/local/lib/libtesseract.so.4
/usr/local/lib/libtesseract.so
/usr/local/lib/pkgconfig/tesseract.pc
/usr/local/lib/libtesseract.la
/usr/local/lib/libtesseract.a
/usr/local/lib/libtesseract.so.4.0.0

Still the same issue...

I am going to try with leptonica-1.74

BTW, the info message from leptonica is probably not related to the terminate error.

Please try to build again with latest source of tesseract from github.

I just did it (restarted from scratch 5 minutes ago and same error)

Here is what I found:

root@1cd9578cac1d:/test/tesseract4# find / -name "*liblept*"
/usr/local/lib/liblept.so.5.0.1
/usr/local/lib/liblept.a
/usr/local/lib/liblept.la
/usr/local/lib/liblept.so.5
/usr/local/lib/liblept.so
root@1cd9578cac1d:/test/tesseract4# find / -name "*leptonica*"
/usr/local/include/leptonica

EDIT: Same error with leptonica 1.74 and 1.74.1 :(

What is the minimum resources configuration to run it?

what output do you get for

tesseract -v

Use GDB to get more info about the cause of the issue.

Asi in my previous comment I had:

root@65369dfbb4d0:/# tesseract -v
tesseract 4.00.00alpha
 leptonica-1.74.1
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8

 Found AVX
 Found SSE

And now

root@1cd9578cac1d:/test/tesseract4# tesseract -v
tesseract 4.00.00alpha
 leptonica-1.74
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8

 Found AVX
 Found SSE

And same issue

if gdb is --enable-debug I was running with it inside and outside a docker container and what I got:

Info in bmfCreate: Generating pixa of bitmap fonts from string terminate called after throwing an instance of 'std::bad_alloc' 
what(): std::bad_alloc Aborted (core dumped)

Without global debug I just get the:

what(): std::bad_alloc Aborted (core dumped)

If not I never tried it. How to activate it?

How do you run GDB?

#256 (comment)

@Shreeshrii I just tested with JPG and tiff and still not working (with same issue)
http://read.pudn.com/downloads196/sourcecode/app/924338/OCR/OCR/TEST_2.JPG
https://github.com/nam-leduc/positioning/raw/master/test1.tif

@amitdo
When I use gdb

Starting program: /usr/local/bin/tesseract test.png out
warning: Error disabling address space randomization: Operation not permitted
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
During startup program terminated with signal SIGABRT, Aborted.

curl https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata > eng.traineddata does not get the expected data file, but gets a HTML redirection file:

<html><body>You are being <a href="https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/eng.traineddata">redirected</a>.</body></html>

Use curl -LO https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata (and similar for other languages), then Tesseract with Docker works for me. With the bad data file, I get an error message:

# tesseract ocr_test.png out -l bad
Info in bmfCreate: Generating pixa of bitmap fonts from string
Error opening data file /usr/local/share/bad.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'bad'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Men it works pefectly!!!

Thanks. The issue is only due to the redirection.

With correct download it is working

You can close the issue. But maybe a littple update on the docker file with example of the download would be great)

Here is the final dockerfile (base on @xlight first draft)

FROM ubuntu
RUN apt-get update && apt-get install -y \
	autoconf \
	automake \
	libtool \
	autoconf-archive \
	pkg-config \
	libpng12-dev \
	libjpeg8-dev \
	libtiff5-dev \
	zlib1g-dev \ 
	libicu-dev \
	libpango1.0-dev \
	libcairo2-dev \
	git \
	curl && \
	rm -rf /var/lib/apt/lists/*

RUN curl http://www.leptonica.org/source/leptonica-1.74.1.tar.gz -o leptonica-1.74.1.tar.gz && \
	tar -zxvf leptonica-1.74.1.tar.gz && \
	cd leptonica-1.74.1 && ./configure && make && make install && \
	cd .. && rm -rf leptonica*

RUN git clone --depth 1 https://github.com/tesseract-ocr/tesseract.git && \
	cd tesseract && \
	./autogen.sh && \
	./configure && \
	LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make && \
	make install && \
	ldconfig && \
	make training && \
	make training-install && \
	cd .. && rm -rf tesseract

# Get basic traineddata
RUN curl -LO https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata && \
	mv eng.traineddata /usr/local/share/tessdata/

RUN curl -LO https://github.com/tesseract-ocr/tessdata/raw/master/fra.traineddata && \
	mv fra.traineddata /usr/local/share/tessdata/

Shouldn't it be curl -LO instead of curl -Lo (upper case O instead of lower case o)?

I'm also still surprised that my docker test produced a different kind of error with the wrong trained data files.

Sorry updated (typo issue :))

we can also use:

git clone https://github.com/tesseract-ocr/tessdata && \
mv  -v tessdata/* /usr/local/share/tessdata/ && \
rm -rf tessadata

To have all the languages

@zdenop Issue can be closed. #893 (comment)

Tesseract should verify that the tessdata file is a TIFF file.

@stweil Should the basic traineddata be osd and eng in the dockerfile that I posted on the wiki?

@amitdo Is the tessdata a tiff file???

My installer for Windows includes both files unconditionally. I think they should be in the docker container, too.

Thanks! I have updated https://github.com/tesseract-ocr/tesseract/wiki/4.0-Dockerfile

Please review and add any other required files (eg. configs etc.) to the docker container.

@amitdo Is the tessdata a tiff file???

I thought that it is a TIFF file without the tiff extension. I was wrong.

#919 (comment)

Definitions of Docker containers and scripts that help to compile and run Tesseract 4 are available at:

https://github.com/tesseract-shadow/tesseract-ocr-compilation