terminate called after throwing an instance of 'std::bad_alloc'
Closed this issue · 33 comments
Hello,
First thanks for your job. I am trying to run tesseract 4 but I am getting an issue:
Info in bmfCreate: Generating pixa of bitmap fonts from string terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted (core dumped)
Step to reproduce (with a docker file):
FROM ubuntu
RUN apt-get update && apt-get install -y \
autoconf \
automake \
libtool \
autoconf-archive \
pkg-config \
libpng12-dev \
libjpeg8-dev \
libtiff5-dev \
zlib1g-dev \
libicu-dev \
libpango1.0-dev \
libcairo2-dev \
git \
curl && \
rm -rf /var/lib/apt/lists/*
RUN curl http://www.leptonica.org/source/leptonica-1.74.1.tar.gz -o leptonica-1.74.1.tar.gz && \
tar -zxvf leptonica-1.74.1.tar.gz && \
cd leptonica-1.74.1 && ./configure && make && make install && \
cd .. && rm -rf leptonica*
RUN git clone --depth 1 https://github.com/tesseract-ocr/tesseract.git && \
cd tesseract && \
./autogen.sh && \
./configure --enable-debug && \
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make && \
make install && \
ldconfig && \
make training && \
make training-install && \
cd .. && rm -rf tesseract
# Get basic traineddata
RUN curl https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata > eng.traineddata && \
mv eng.traineddata /usr/local/share/tessdata/
RUN curl https://github.com/tesseract-ocr/tessdata/raw/master/fra.traineddata > fra.traineddata && \
mv fra.traineddata /usr/local/share/tessdata/
Then:
docker build -t tesseract4 .
docker run tesseract4
docker run -t -i tesseract4 /bin/bash
mkdir test
cd test
curl http://tleyden-misc.s3.amazonaws.com/blog_images/ocr_test.png > test.png
tesseract test.png out
Can someone explain me what is happening?
For information I have 2471 megabytes of memory remaning
Thanks in advance
I did not built it with ubuntu.
I read in the referenced issue that we should not use it in docker image. Do you know why ?
I need to use it in such way
I do not know about docker images.
I thought @amitdo was referring to --enable-debug option of configure.
I will try to use it without enabled-debug option and give you the output
I try using with and without --enable-debug and nothing is working.
Still the same issue:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
My issue is not a build failure.
Build is going well. The issue is when I launch tesseract
EDIT: I made a try outside of a docker image by simply running the command manually and I have the same error with or without the --enable-debug
Error message Info in bmfCreate: Generating pixa of bitmap fonts
is similar to #873
That error/info message is from Leptonica
Please check where leptonica is installed. Do you have multiple versions?
Concerning the multiple version I have only one installed.
I won't be able to see installation directory tonight because I deleted my instance aws. I will create a new one tomorrow. Can you tell me the normal installation directory so I can check tomorrow ?
However according the "make" documentation it should be in /usr/bin
@speedfl No need to rebuild. I have not used docker so was just guessing.
@xlight #817 (comment) maybe able to help.
Here is my configuration:
root@65369dfbb4d0:/# tesseract -v
tesseract 4.00.00alpha
leptonica-1.74.1
libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8
Found AVX
Found SSE
And here is where I found tesseract packages
root@65369dfbb4d0:/# find / -name "*tesseract*"
/usr/local/include/tesseract
/usr/local/bin/tesseract
/usr/local/lib/libtesseract.so.4
/usr/local/lib/libtesseract.so
/usr/local/lib/pkgconfig/tesseract.pc
/usr/local/lib/libtesseract.la
/usr/local/lib/libtesseract.a
/usr/local/lib/libtesseract.so.4.0.0
Still the same issue...
I am going to try with leptonica-1.74
BTW, the info message from leptonica is probably not related to the terminate error.
Please try to build again with latest source of tesseract from github.
I just did it (restarted from scratch 5 minutes ago and same error)
Here is what I found:
root@1cd9578cac1d:/test/tesseract4# find / -name "*liblept*"
/usr/local/lib/liblept.so.5.0.1
/usr/local/lib/liblept.a
/usr/local/lib/liblept.la
/usr/local/lib/liblept.so.5
/usr/local/lib/liblept.so
root@1cd9578cac1d:/test/tesseract4# find / -name "*leptonica*"
/usr/local/include/leptonica
EDIT: Same error with leptonica 1.74 and 1.74.1 :(
What is the minimum resources configuration to run it?
what output do you get for
tesseract -v
Use GDB to get more info about the cause of the issue.
Asi in my previous comment I had:
root@65369dfbb4d0:/# tesseract -v
tesseract 4.00.00alpha
leptonica-1.74.1
libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8
Found AVX
Found SSE
And now
root@1cd9578cac1d:/test/tesseract4# tesseract -v
tesseract 4.00.00alpha
leptonica-1.74
libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8
Found AVX
Found SSE
And same issue
if gdb is --enable-debug I was running with it inside and outside a docker container and what I got:
Info in bmfCreate: Generating pixa of bitmap fonts from string terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc Aborted (core dumped)
Without global debug I just get the:
what(): std::bad_alloc Aborted (core dumped)
If not I never tried it. How to activate it?
How do you run GDB?
@Shreeshrii I just tested with JPG and tiff and still not working (with same issue)
http://read.pudn.com/downloads196/sourcecode/app/924338/OCR/OCR/TEST_2.JPG
https://github.com/nam-leduc/positioning/raw/master/test1.tif
@amitdo
When I use gdb
Starting program: /usr/local/bin/tesseract test.png out
warning: Error disabling address space randomization: Operation not permitted
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
During startup program terminated with signal SIGABRT, Aborted.
curl https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata > eng.traineddata
does not get the expected data file, but gets a HTML redirection file:
<html><body>You are being <a href="https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/eng.traineddata">redirected</a>.</body></html>
Use curl -LO https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
(and similar for other languages), then Tesseract with Docker works for me. With the bad data file, I get an error message:
# tesseract ocr_test.png out -l bad
Info in bmfCreate: Generating pixa of bitmap fonts from string
Error opening data file /usr/local/share/bad.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'bad'
Tesseract couldn't load any languages!
Could not initialize tesseract.
Men it works pefectly!!!
Thanks. The issue is only due to the redirection.
With correct download it is working
You can close the issue. But maybe a littple update on the docker file with example of the download would be great)
Here is the final dockerfile (base on @xlight first draft)
FROM ubuntu
RUN apt-get update && apt-get install -y \
autoconf \
automake \
libtool \
autoconf-archive \
pkg-config \
libpng12-dev \
libjpeg8-dev \
libtiff5-dev \
zlib1g-dev \
libicu-dev \
libpango1.0-dev \
libcairo2-dev \
git \
curl && \
rm -rf /var/lib/apt/lists/*
RUN curl http://www.leptonica.org/source/leptonica-1.74.1.tar.gz -o leptonica-1.74.1.tar.gz && \
tar -zxvf leptonica-1.74.1.tar.gz && \
cd leptonica-1.74.1 && ./configure && make && make install && \
cd .. && rm -rf leptonica*
RUN git clone --depth 1 https://github.com/tesseract-ocr/tesseract.git && \
cd tesseract && \
./autogen.sh && \
./configure && \
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make && \
make install && \
ldconfig && \
make training && \
make training-install && \
cd .. && rm -rf tesseract
# Get basic traineddata
RUN curl -LO https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata && \
mv eng.traineddata /usr/local/share/tessdata/
RUN curl -LO https://github.com/tesseract-ocr/tessdata/raw/master/fra.traineddata && \
mv fra.traineddata /usr/local/share/tessdata/
Shouldn't it be curl -LO
instead of curl -Lo
(upper case O instead of lower case o)?
I'm also still surprised that my docker test produced a different kind of error with the wrong trained data files.
Sorry updated (typo issue :))
we can also use:
git clone https://github.com/tesseract-ocr/tessdata && \
mv -v tessdata/* /usr/local/share/tessdata/ && \
rm -rf tessadata
To have all the languages
@zdenop Issue can be closed. #893 (comment)
Added page to wiki - https://github.com/tesseract-ocr/tesseract/wiki/4.0-Dockerfile
Tesseract should verify that the tessdata file is a TIFF file.
My installer for Windows includes both files unconditionally. I think they should be in the docker container, too.
Thanks! I have updated https://github.com/tesseract-ocr/tesseract/wiki/4.0-Dockerfile
Please review and add any other required files (eg. configs etc.) to the docker container.
@amitdo Is the tessdata a tiff file???
I thought that it is a TIFF file without the tiff extension. I was wrong.
Definitions of Docker containers and scripts that help to compile and run Tesseract 4 are available at:
https://github.com/tesseract-shadow/tesseract-ocr-compilation