A Telegram bot for applying OCR recognition to PDF documents.
Pytonisa is a Telegram bot available for free (for as long as I have student credits on digital ocean) on Telegram at @mf_ocr_bot.
The name was choosen from a Greek Oracle named Pitonisa
Preview:
https://www.youtube.com/watch?v=to0HWlrMVNw
The project, as it is, was designed to run in a single computer since I only contracted one node on Digital Ocean. The more RAM and CPU, the better it will process (I recommend AT LEAST 2GB RAM).
Current architecture
draw.io
You can use the above site to view/edit the architecture.drawio file
I may develop a solution for the cloud, with queue, database, ocrprocessor and telegram client dettached from each other. Maybe an API for accessing database and queue instead of accessing it directly.
First, you must declare the following environment variables:
TELEGRAM_API_ID=1234567
TELEGRAM_API_HASH=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
TELEGRAM_BOT_TOKEN=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-bbbbbbb-ccccc
MONGO_INITDB_ROOT_USERNAME=admin
MONGO_INITDB_ROOT_PASSWORD=admin
To get TELEGRAM_API_ID, TELEGRAM_API_HASH, you must go to https://my.telegram.org, under API Development.
To get TELEGRAM_BOT_TOKEN, you must talk to @BotFather.
MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD are arbitrary.
Then:
docker-compose up --build
You may want to run the above command with the -d flag as well for running in the background
NOTE: I did not secure mongodb and rabbitmq for production deployment. I may do this in the future.
- Luís Chaves - Development
- Maria Fernanda Melgaço - Feedback and tests
- Isabele - Pytonisa drawing - @izzy.m.f