an HTTP-API format PDF_OCR , Voice_OCR & VideoOCR backend in Python with PaddleOCR and PocketSphinx

Short Description

an http-api pdf, voice, video OCR format backend written in Python with PaddleOCR moudle and FastAPI.

ENV init

Consier start a container from a docker image first:

$ docker run -itd -p <host_port>:8800 ech0potato/sansan:v0.7 /bin/bash /root/start.sh 

then you can direct to API document to try this backend program.

If you want to configure the envrionment yourself:

ubuntu 18.04 or ubuntu 18.04 + Python 3.7 ( Recommended ) , other version might get paddleOCR Error with unknown reason.

$ apt install libgl1-mesa-glx  libpulse-dev libasound2-dev python-all-dev build-essential swig 
$ pip install fastapi opencv-python paddlepaddle paddleocr wheel speechrecognition fitz pocketsphinx PyMuPDF filetype

Start ( if you configured the envrionment yourself )

# change directory to the root of this project.
$ python3 main.py

the RESTAPI backend will listen at localhost:8800 by default.

API Details



URL: http://domain:port/api/pdfocr
Content-Type: application/pdf

Input Example:



"code" : 2000,
"message": "success",
"page": 1,
"words": "第一行文字",
[10, 10],
"words": "第二行文字",
[10, 20],
"page": 2,
"words": "第一行文字",
[10, 10],
"words": "第二行文字",
[10, 20],