document-intelligence
There are 46 repositories under document-intelligence topic.
PaddlePaddle/PaddleNLP
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
Goldziher/kreuzberg
Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
shcherbak-ai/contextgem
ContextGem: Effortless LLM extraction from documents
tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
enoch3712/ExtractThinker
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
Azure/AI-in-a-Box
AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.
doc-analysis/ReadingBank
ReadingBank: A Benchmark Dataset for Reading Order Detection
Azure-Samples/azure-ai-document-processing-samples
A collection of samples demonstrating techniques for processing documents with Azure AI including AI Foundry, OpenAI, Document Intelligence, etc.
Azure-Samples/doc-intelligence-in-a-box
The Doc Intelligence in-a-Box project leverages Azure AI Document Intelligence to extract data from PDF forms and store the data in a Azure Cosmos DB. This solution, part of the AI-in-a-Box framework by Microsoft Customer Engineers and Architects, ensures quality, efficiency, and rapid deployment of AI and ML solutions across various industries.
jamesmcroft/azure-document-intelligence-markdown-to-openai-data-extraction-sample
This sample demonstrates how to use Document Intelligence's Layout model to convert a PDF document, such as invoices, into Markdown, then use GPT-3.5 Turbo to extract structured JSON data using the Azure OpenAI Service.
qyhou/curated-table-structure-recognition
A curated list of resources on Table Structure Recognition
ihdia/BoundaryNet
BoundaryNet - A Semi-Automatic Layout Annotation Tool
AI-Engineering-Study-Group/docugent
AI-powered document intelligence platform for automated analysis, processing, and insights extraction from various document formats.
jamesmcroft/document-intelligence-user-feedback-processor
An experiment to provide the capabilities of Azure AI Document Intelligence Studio template training for feedback loop
qyhou/curated-document-layout-analysis
A curated list of resources on Document Layout Analysis
codedbyasim/Generative-AI-Document-Intelligence-System
Extract and summarise data from PDFs and images using OCR + LLMs. Built with Python, OpenCV, HuggingFace, and Flask.
fenilsonani/rag-document-qa
Enterprise-grade RAG system featuring dual online/offline operation, multi-modal document processing, and advanced AI capabilities including knowledge graph construction and hybrid search for intelligent document analysis.
fri3erg/DataDig-AIExtractor
App used to extract structured data from documents photos or pdfs via custom templating and commercial LLM (GPT and Azure Document Intelligence). Developed as a Computer Science Thesis at University of Bologna
igorcervac/AzureAiSamples
Azure AI Samples
joinalahmed/invoiceparsingwithAOAI
Using Azure Document Intelligence and Azure OpenAI services to automatically extract data from invoices.
ks6088ts-labs/azure-ai-services-solutions
A collection of solutions that leverage Azure AI services.
Md-Emon-Hasan/AutoDocThinker
🔍 Agentic AI system that allows users to upload documents (PDFs, DOCX, etc.) and natural language questions. It uses LLM-based RAG to extract relevant information. The architecture includes multi-agent components such as document retrievers, summarizers, web searchers, and tool routers — enabling dynamic reasoning and accurate responses.
MSUSAzureAccelerators/SouthReusableAssets
IP and use case assets for CSU
sshashi7/AIRoadShow
Hands-on labs and mini hackathon to build a Sales Buddy Agent using Copilot Studio and Azure AI
willermo/markdown-for-llms
A comprehensive, production-ready Python pipeline for converting various document formats into clean, validated, and optimally chunked Markdown files ready for Large Language Model (LLM) consumption and NotebookLM notebooks.
aihearticu/azure-ai-learning-hub
Comprehensive learning hub for Azure AI services - 130+ labs and tutorials covering AI-102 certification
colingalbraith/OpenRAGSearch
Local RAG-powered document analysis platform with PDF QA, Ollama integration, and citation-aware search.
CSetters/Upgraded-Search-for-Accela
Sample proposal using Azure AI Search + Document Intelligence to enrich digitized records and integrate with Accela for faster, transparent permit access.
jahnavi-j9/adobe-hackathon-Round1A
An intelligent, offline-first PDF understanding system built for the Adobe India Hackathon 2025. Extracts structured outlines and semantically ranks sections based on persona-driven tasks, fully Dockerized and scoring-ready.
jasjeev013/Neuroquery-Chroma-RAG
NeuroQuery is an AI-powered PDF question-answering system that lets you upload and interact with documents using natural language. Built with LangChain, Gemini AI, and Chroma, it delivers fast, context-aware answers from your files.
mfbcat/data-extraction-tool-with-AI-document-intelligence
This Azure Document Intelligence extraction tool converts unstructured documents into structured data. It accurately extracts text and tables, making your information ready for LLMs and immediate use.
Polymath-Saksh/document-intelligence
Intelligent offline system for extracting and ranking the most relevant sections from PDFs based on user persona and task.
sachinsingh45/Personal-Finance-Tracker
A modern full-stack personal finance tracker with AI-powered receipt processing, interactive analytics, and a seamless user experience.
Sumanth1410-git/internal-docs-agent
Enterprise AI assistant for intelligent document Q&A via Slack - Advanced RAG system with multi-language support.
YugVarshney/Adobe_Hackathon_Round_1B
Persona-Driven Document Intelligence – Offline system that extracts, ranks, and refines the most relevant PDF sections based on a persona and their job-to-be-done. Built for Adobe Hackathon 2025 (Round 1B) with heading-aware parsing, semantic embeddings, and batch inference.