0AlphaZero0's Stars
tesseract-ocr/tesseract
Tesseract Open Source OCR Engine (main repository)
PaddlePaddle/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
andrewyng/aisuite
Simple, unified interface to multiple Generative AI providers
WongKinYiu/yolov9
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
NMAC427/SwiftOCR
Fast and simple OCR library written in Swift
open-mmlab/mmocr
OpenMMLab Text Detection, Recognition and Understanding Toolbox
mindee/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
clovaai/deep-text-recognition-benchmark
Text recognition (optical character recognition) with deep learning methods, ICCV 2019
RapidAI/RapidOCR
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.
eragonruan/text-detection-ctpn
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network
aim-uofa/AdelaiDet
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
clovaai/CRAFT-pytorch
Official implementation of Character Region Awareness for Text Detection (CRAFT)
deepdoctection/deepdoctection
A Repo For Document AI
pikepdf/pikepdf
A Python library for reading and writing PDF, powered by QPDF
chezou/tabula-py
Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame
Belval/pdf2image
A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
JonathanLink/PDFLayoutTextStripper
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
faustomorales/keras-ocr
A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.
spatie/pdf-to-image
Convert a pdf to an image
mlco2/codecarbon
Track emissions from Compute and recommend ways to reduce their impact on the environment.
paulocoutinhox/pdfium-lib
PDFium - Project to compile PDFium library to multiple platforms.
VILA-Lab/ATLAS
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171
jalan/pdftotext
Simple PDF text extraction
spatie/pdf-to-text
Extract text from a pdf
Unstructured-IO/unstructured-api
ml-energy/zeus
Deep Learning Energy Measurement and Optimization
ja-mcm/OCRfixr
A context-based spellchecker for correcting OCR output.