document-ai

There are 45 repositories under document-ai topic.

  • microsoft/unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Language:Python21.7k3091.4k2.7k
  • clovaai/donut

    Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

    Language:Python6.5k53315536
  • deepdoctection/deepdoctection

    A Repo For Document AI

    Language:Python3k20193169
  • tstanislawek/awesome-document-understanding

    A curated list of resources for Document Understanding (DU) topic

  • jpWang/LiLT

    Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)

    Language:Python35564741
  • SCUT-DLVCLab/Document-AI-Recommendations

    Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.

  • doc-analysis/ReadingBank

    ReadingBank: A Benchmark Dataset for Reading Order Detection

  • clovaai/webvicob

    Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023

    Language:Python108438
  • nttmdlab-nlp/SlideVQA

    SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)

    Language:Python95158
  • ZeningLin/ViBERTgrid-PyTorch

    An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"

    Language:Python534115
  • whn09/table_structure_recognition

    Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared with Table Transformer (TATR) with smaller models.

    Language:Jupyter Notebook4541314
  • googleapis/python-documentai-toolbox

    Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.

    Language:Python42256617
  • DunnBC22/Vision_Audio_and_Multimodal_Projects

    This repository includes all computer vision, audio, document AI, and multimodal projects.

    Language:Jupyter Notebook415310
  • ZeningLin/PEneo

    [MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.

    Language:Python37487
  • Unstructured-IO/community

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

  • qyhou/curated-table-structure-recognition

    A curated list of resources on Table Structure Recognition

  • SCUT-DLVCLab/RFUND

    [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"

  • chenxn2020/GOSE

    [Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"

    Language:Python17141
  • NirmalNagaraj/DocGPT

    A Chatbot for the Document Analysis .

    Language:Python11200
  • conditionedstimulus/DocumentClassifier

    FastAPI application for document classification using a multimodal LayoutLM model, designed to classify PDF documents into RVL-DCIP categories.

    Language:Jupyter Notebook8100
  • bwnyasse/dart-documentai-samples

    A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.

    Language:Dart52380
  • dhorvay/document-understanding-ebook

    (WIP) ✨ A comprehensive resource for understanding the world of software used in the Document Understanding field. 🧙✨

    Language:Markdown5100
  • wintermi/ocr-runner

    OCR Runner - Command Line Application for processing image files using Google Cloud Vision API and Google Cloud Document AI.

    Language:Go4101
  • bhadreshpsavani/SmartOCR-with-LayoutLM

    Exploring LayoutLM for Smart OCR Capabilities

  • devraftel/snapdoc-edge-ai

    SnapDoc AI processes everything on-device, ensuring your sensitive information never leaves your control. Use voice and text on-device processing in organizations.

    Language:Python3100
  • ajaycode/unstructured

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    Language:HTML2000
  • Purushothaman-natarajan/Custom-NER-Model-using-Spacy-Fine-Tuning

    Spacy for Key:Value pairs

    Language:Jupyter Notebook2200
  • smartloop-ai/smartloop

    Smartloop is an open-source SLM platform to train and run models on an edge device

    Language:Python2200
  • AliAlWahayb/Arabic-PDF-OCR-Text-Extraction

    This repository implements a complete solution for processing Arabic text within PDF documents. Key features include: OCR using Google Cloud Document AI. Arabic text cleaning and formatting. Optional diacritization using Farasa. Asynchronous processing for faster performance.

    Language:Python1100
  • arakattack/ocr-transcript

    This Flask application Google Cloud Document AI to extract name, IPK (GPA), university details, etc.

    Language:Python1100
  • IonMich/batch-doc-vqa

    Ask a question about a document collection and extract structured responses

    Language:Python1130
  • jcaperella29/Document_cleaning_CLI

    🧠 AI-powered pipeline for cleaning scanned documents. Removes noise, enhances text, auto-tunes model weights, and returns OCR-optimized PDFs via CLI or cloud API.

    Language:MATLAB1100
  • masoudshab/Doc2Edi

    Extracting Data from Document PDF and Converting to EDI211 Files Using GCP and Google Document AI

    Language:Python1200
  • samkenxstream/SamKenX_documents-ai

    SamKenX applications and Document AI, the end-to-end document processing platform on Cloudstorage warehouse.

    Language:Python1101
  • HimanshuMohanty-Git24/KhataGPT

    Transform how you interact with documents! Simply upload receipts, invoices, or forms and instantly chat with them. Get answers, extract key information, and save hours of manual work. Your personal document assistant that understands what matters to you.

    Language:JavaScript
  • ozcanmiraay/opsbot

    AI-powered PDF extraction suite for structured insights from contracts, forms, and documents. Built with Streamlit, LangChain, GPT-4o, and PDFPlumber.

    Language:Python