Arabic-OCR

Arabic Image-to-text Converter

A user can input a whole PDF with Arabic text images and it will output 2 files with digitized Arabic text that can be copied/searched/analyzed/etc. One file is a PDF of the output and the other is a Word file of the output.

This was done using the Tesseract Python library. Its accuracy is far from a 100% but example image is shown below.

Usage

  1. Install Python and Streamlit
  2. Install all required library and packages
  3. Run "streamlit run SL_ArabicOCR.py"

OG_OCR_comp

Hosted by Streamlit Cloud