document-image-processing

There are 18 repositories under document-image-processing topic.

  • Unstructured-IO/unstructured

    Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

    Language:HTML12.7k681.2k1k
  • Layout-Parser/layout-parser

    A Unified Toolkit for Deep Learning Based Document Image Analysis

    Language:Python5.5k74152512
  • fh2019ustc/Awesome-Document-Image-Rectification

    A comprehensive list of awesome document image rectification papers.

  • fh2019ustc/DocTr

    The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

    Language:Python402163255
  • fh2019ustc/DocScanner

    The official repo for “DocScanner: Robust Document Image Rectification with Progressive Learning”, IJCV, 2025.

    Language:Python20113918
  • hpanwar08/detectron2

    Detectron2 for Document Layout Analysis

    Language:Python18874763
  • jiangnanboy/Doc-Image-Tool

    文档图像处理工具(Document image processing tool),包括漂白 / 文字方向矫正 / 清晰增强 / 笔记去噪美化 / 去阴影 / 扭曲矫正 / 切边增强(DocBleach / TextOrientationCorrection / DocSharpening / HandwritingDenoisingBeautifying / DocShadowRemoval / document_image_dewarping / DocTrimmingEnhancement)。

    Language:Python9111018
  • fh2019ustc/DocGeoNet

    The official code for “Geometric Representation Learning for Document Image Rectification”, ECCV, 2022.

    Language:Python815112
  • Nomiluks/Handwritting-OCR

    Android App for English Handwritten Text Recognition

    Language:Java15225
  • caltechlibrary/documentarist

    Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents

    Language:Python12604
  • jchazalon/smartdoc15-ch1-pywrapper

    Python wrapper to facilitate data manipulation for the SmartDoc 2015 - Challenge 1 Dataset.

    Language:Jupyter Notebook6102
  • Transkribus/competitions

    The ScriptNet / competitions site.

    Language:Python6111116
  • jiangnanboy/docimg_tool

    复杂背景图像漂白,文字方向矫正,清晰增强,笔记去噪美化,去阴影,扭曲矫正,去黑点以及切边增强。complex background image bleaching, text direction correction, clarity enhancement, note to blur beautification, shadow removal, distortion correction, black spots removal and cutting edge enhancement。

  • tony-xlh/quality-evaluation-of-scanned-document-images

    A web app evaluating the quality the scanned document images

    Language:HTML3201
  • ajaycode/unstructured

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    Language:HTML2000
  • mx3123/Py-document-cropper

    This script automates the process of extracting text from various file formats (images, PDFs, DOCX) using Optical Character Recognition (OCR) powered by Azure Cognitive Services. The script supports image preprocessing, text extraction, and uploading of the processed files to Google Cloud Storage (GCP).

    Language:Python00
  • sfikas/sophia-trikoupi-handwritten-dataset

    Sophia Trikoupi dataset (Collection of 46 handwritten, annotated pages)

    Language:Python10