AWS Lambda function to run tesseract OCR
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
The idea is to use a docker container to simulate an AWS lambda environment this allows to build binaries against AWS lambda linux env. In this example I have build leptonica and Tesseract Open Source OCR Engine.
The whole idea is leveraged from here
In order to get started you need docker. This is a very basic lamdba example and was tested on AWS Lambda Python3.8 environment. AWS deployment will be automated using serverless framework
# Install serverless globally
npm install serverless -g
Follow the AWS tutorial to create access keys for your user.
Follow the Serverless tutorial
docker build -t tesseract .
mkdir build
docker run -v $PWD/build:/tmp/build tesseract sh /tmp/build_tesseract.sh
mkdir layer
unzip build/tesseract.zip -d layer
mkdir -p layer/python/lib/python3.8/site-packages/
pip install pytesseract -t layer/python/lib/python3.8/site-packages/
ls layer
tesseract #compiled tesseract binary
tessdata #tesseract language package eng
lib #compiled lib dependencies
python #python dependencies
serverless package
serverless deploy
The lambda function is accepting json post request The URl will be which was printed from serverless deploy command
{
"image64": "base64 encoded image"
}