document-ai

There are 69 repositories under document-ai topic.

microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language:Python21.8k 302 1.4k2.7k
clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Language:Python6.6k 52 315543
deepdoctection/deepdoctection
A Repo For Document AI
Language:Python3k 20 200172
tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
1.5k 36 2166
jpWang/LiLT
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
Language:Python357 5 4841
SCUT-DLVCLab/Document-AI-Recommendations
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
202 10 19
doc-analysis/ReadingBank
ReadingBank: A Benchmark Dataset for Reading Order Detection
113 1 114
clovaai/webvicob
Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023
Language:Python109 4 39
nttmdlab-nlp/SlideVQA
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
Language:Python100 1 58
ZeningLin/ViBERTgrid-PyTorch
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"
Language:Python53 4 115
whn09/table_structure_recognition
Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared with Table Transformer (TATR) with smaller models.
Language:Jupyter Notebook51 4 1315
DunnBC22/Vision_Audio_and_Multimodal_Projects
This repository includes all computer vision, audio, document AI, and multimodal projects.
Language:Jupyter Notebook49 5 312
googleapis/python-documentai-toolbox
Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.
Language:Python47 26 6919
nttmdlab-nlp/VDocRAG
[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents
Language:Python464
ZeningLin/PEneo
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
Language:Python37 4 86
Unstructured-IO/community
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
29 24 298
qyhou/curated-table-structure-recognition
A curated list of resources on Table Structure Recognition
28 1 02
SCUT-DLVCLab/RFUND
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
20 1 00
Shulk97/daniel
This repository contain the implementation of DANIEL. (A fast Document Attention Network for Information Extraction and Labeling of handwritten documents)
Language:Python20 4 21
chenxn2020/GOSE
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
Language:Python17 1 41
NirmalNagaraj/DocGPT
A Chatbot for the Document Analysis .
Language:Python12 2 00
conditionedstimulus/DocumentClassifier
FastAPI application for document classification using a multimodal LayoutLM model, designed to classify PDF documents into RVL-DCIP categories.
Language:Jupyter Notebook9 1 00
qyhou/curated-document-layout-analysis
A curated list of resources on Document Layout Analysis
9 1 00
dhorvay/document-understanding-ebook
(WIP) ✨ A comprehensive resource for understanding the world of software used in the Document Understanding field. 🧙✨
Language:Markdown6 1 00
bwnyasse/dart-documentai-samples
A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.
Language:Dart5 2 380
wintermi/ocr-runner
OCR Runner - Command Line Application for processing image files using Google Cloud Vision API and Google Cloud Document AI.
Language:Go4 1 01
bhadreshpsavani/SmartOCR-with-LayoutLM
Exploring LayoutLM for Smart OCR Capabilities
3 1 0
devraftel/snapdoc-edge-ai
SnapDoc AI processes everything on-device, ensuring your sensitive information never leaves your control. Use voice and text on-device processing in organizations.
Language:Python3 2 00
gregorymulla/grepctl
BigQuery Semantic Search Orchestrator
Language:Python3
Purushothaman-natarajan/Custom-NER-Model-using-Spacy-Fine-Tuning
Spacy for Key:Value pairs
Language:Jupyter Notebook3 2 00
ajaycode/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Language:HTML2 0 00
IonMich/batch-doc-vqa
Ask a question about a document collection and extract structured responses
Language:Python2 1 30
masfaatanveer/Lease-Summarization-Model-NLP
This project uses OCR and a BART-based NLP pipeline to extract and summarize landlord, tenant, property, and contract details from scanned lease agreements. It combines Tesseract OCR, pdf2image, and HuggingFace Transformers to deliver structured legal summaries in JSON format.
Language:Python2
rag-fish/RAGfish
RAGfish The open-source standard for private, offline, multi-pack LLM RAG — unified RAGpack format, world-class pipeline, and reference macOS/iOS client. Your knowledge, your device, your rules.
20
smartloop-ai/smartloop
Smartloop is an open-source SLM platform to train and run models on an edge device
Language:Python2 2 00
ayush2635/Invoiscope
Intelligent GST Invoice Information Extractor powered by YOLOv9c object detection and OCR technology. Extract 24+ invoice fields with 74.1% mAP accuracy through a modern Streamlit web interface.
Language:Python1

document-ai

microsoft/unilm

clovaai/donut

deepdoctection/deepdoctection

tstanislawek/awesome-document-understanding

jpWang/LiLT

SCUT-DLVCLab/Document-AI-Recommendations

doc-analysis/ReadingBank

clovaai/webvicob

nttmdlab-nlp/SlideVQA

ZeningLin/ViBERTgrid-PyTorch

whn09/table_structure_recognition

DunnBC22/Vision_Audio_and_Multimodal_Projects

googleapis/python-documentai-toolbox

nttmdlab-nlp/VDocRAG

ZeningLin/PEneo

Unstructured-IO/community

qyhou/curated-table-structure-recognition

SCUT-DLVCLab/RFUND

Shulk97/daniel

chenxn2020/GOSE

NirmalNagaraj/DocGPT

conditionedstimulus/DocumentClassifier

qyhou/curated-document-layout-analysis

dhorvay/document-understanding-ebook

bwnyasse/dart-documentai-samples

wintermi/ocr-runner

bhadreshpsavani/SmartOCR-with-LayoutLM

devraftel/snapdoc-edge-ai

gregorymulla/grepctl

Purushothaman-natarajan/Custom-NER-Model-using-Spacy-Fine-Tuning

ajaycode/unstructured

IonMich/batch-doc-vqa

masfaatanveer/Lease-Summarization-Model-NLP

rag-fish/RAGfish

smartloop-ai/smartloop

ayush2635/Invoiscope