document-intelligence

There are 46 repositories under document-intelligence topic.

  • PaddleNLP

    PaddlePaddle/PaddleNLP

    Easy-to-use and powerful LLM and SLM library with awesome model zoo.

    Language:Python12.8k1003.8k3.1k
  • Goldziher/kreuzberg

    Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.

    Language:Python2.4k125197
  • AlibabaResearch/AdvancedLiterateMachinery

    A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

    Language:C++1.8k43205198
  • contextgem

    shcherbak-ai/contextgem

    ContextGem: Effortless LLM extraction from documents

    Language:Python1.5k1313116
  • tstanislawek/awesome-document-understanding

    A curated list of resources for Document Understanding (DU) topic

  • ExtractThinker

    enoch3712/ExtractThinker

    ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

    Language:Python1.4k21172137
  • Azure/AI-in-a-Box

    AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.

    Language:Jupyter Notebook5862428197
  • doc-analysis/ReadingBank

    ReadingBank: A Benchmark Dataset for Reading Order Detection

  • Azure-Samples/azure-ai-document-processing-samples

    A collection of samples demonstrating techniques for processing documents with Azure AI including AI Foundry, OpenAI, Document Intelligence, etc.

    Language:Bicep104151132
  • Azure-Samples/doc-intelligence-in-a-box

    The Doc Intelligence in-a-Box project leverages Azure AI Document Intelligence to extract data from PDF forms and store the data in a Azure Cosmos DB. This solution, part of the AI-in-a-Box framework by Microsoft Customer Engineers and Architects, ensures quality, efficiency, and rapid deployment of AI and ML solutions across various industries.

    Language:Bicep381128
  • jamesmcroft/azure-document-intelligence-markdown-to-openai-data-extraction-sample

    This sample demonstrates how to use Document Intelligence's Layout model to convert a PDF document, such as invoices, into Markdown, then use GPT-3.5 Turbo to extract structured JSON data using the Azure OpenAI Service.

    Language:Jupyter Notebook302110
  • qyhou/curated-table-structure-recognition

    A curated list of resources on Table Structure Recognition

  • ihdia/BoundaryNet

    BoundaryNet - A Semi-Automatic Layout Annotation Tool

    Language:Python24425
  • AI-Engineering-Study-Group/docugent

    AI-powered document intelligence platform for automated analysis, processing, and insights extraction from various document formats.

    Language:Python16
  • jamesmcroft/document-intelligence-user-feedback-processor

    An experiment to provide the capabilities of Azure AI Document Intelligence Studio template training for feedback loop

    Language:Python10204
  • qyhou/curated-document-layout-analysis

    A curated list of resources on Document Layout Analysis

  • codedbyasim/Generative-AI-Document-Intelligence-System

    Extract and summarise data from PDFs and images using OCR + LLMs. Built with Python, OpenCV, HuggingFace, and Flask.

    Language:Python1
  • fenilsonani/rag-document-qa

    Enterprise-grade RAG system featuring dual online/offline operation, multi-modal document processing, and advanced AI capabilities including knowledge graph construction and hybrid search for intelligent document analysis.

    Language:Python1
  • fri3erg/DataDig-AIExtractor

    App used to extract structured data from documents photos or pdfs via custom templating and commercial LLM (GPT and Azure Document Intelligence). Developed as a Computer Science Thesis at University of Bologna

    Language:Python1
  • igorcervac/AzureAiSamples

    Azure AI Samples

    Language:C#1100
  • joinalahmed/invoiceparsingwithAOAI

    Using Azure Document Intelligence and Azure OpenAI services to automatically extract data from invoices.

    Language:HTML1102
  • ks6088ts-labs/azure-ai-services-solutions

    A collection of solutions that leverage Azure AI services.

    Language:Python11461
  • Md-Emon-Hasan/AutoDocThinker

    🔍 Agentic AI system that allows users to upload documents (PDFs, DOCX, etc.) and natural language questions. It uses LLM-based RAG to extract relevant information. The architecture includes multi-agent components such as document retrievers, summarizers, web searchers, and tool routers — enabling dynamic reasoning and accurate responses.

    Language:Jupyter Notebook1
  • MSUSAzureAccelerators/SouthReusableAssets

    IP and use case assets for CSU

    Language:Ruby1301
  • sshashi7/AIRoadShow

    Hands-on labs and mini hackathon to build a Sales Buddy Agent using Copilot Studio and Azure AI

  • willermo/markdown-for-llms

    A comprehensive, production-ready Python pipeline for converting various document formats into clean, validated, and optimally chunked Markdown files ready for Large Language Model (LLM) consumption and NotebookLM notebooks.

    Language:Python1
  • aihearticu/azure-ai-learning-hub

    Comprehensive learning hub for Azure AI services - 130+ labs and tutorials covering AI-102 certification

    Language:Python
  • colingalbraith/OpenRAGSearch

    Local RAG-powered document analysis platform with PDF QA, Ollama integration, and citation-aware search.

    Language:JavaScript
  • CSetters/Upgraded-Search-for-Accela

    Sample proposal using Azure AI Search + Document Intelligence to enrich digitized records and integrate with Accela for faster, transparent permit access.

  • jahnavi-j9/adobe-hackathon-Round1A

    An intelligent, offline-first PDF understanding system built for the Adobe India Hackathon 2025. Extracts structured outlines and semantically ranks sections based on persona-driven tasks, fully Dockerized and scoring-ready.

    Language:Python
  • jasjeev013/Neuroquery-Chroma-RAG

    NeuroQuery is an AI-powered PDF question-answering system that lets you upload and interact with documents using natural language. Built with LangChain, Gemini AI, and Chroma, it delivers fast, context-aware answers from your files.

    Language:Python
  • mfbcat/data-extraction-tool-with-AI-document-intelligence

    This Azure Document Intelligence extraction tool converts unstructured documents into structured data. It accurately extracts text and tables, making your information ready for LLMs and immediate use.

    Language:Python
  • Polymath-Saksh/document-intelligence

    Intelligent offline system for extracting and ranking the most relevant sections from PDFs based on user persona and task.

    Language:Python
  • sachinsingh45/Personal-Finance-Tracker

    A modern full-stack personal finance tracker with AI-powered receipt processing, interactive analytics, and a seamless user experience.

    Language:JavaScript
  • Sumanth1410-git/internal-docs-agent

    Enterprise AI assistant for intelligent document Q&A via Slack - Advanced RAG system with multi-language support.

    Language:Python
  • YugVarshney/Adobe_Hackathon_Round_1B

    Persona-Driven Document Intelligence – Offline system that extracts, ranks, and refines the most relevant PDF sections based on a persona and their job-to-be-done. Built for Adobe Hackathon 2025 (Round 1B) with heading-aware parsing, semantic embeddings, and batch inference.

    Language:Python