docling

There are 61 repositories under docling topic.

  • shoryasethia/markdrop

    A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.

    Language:Python1511611
  • genieincodebottle/parsemypdf

    Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.

    Language:Python1343128
  • fahdmirza/doclingwithollama

    Docling with Ollama - RAG on Local Files with Local Models

    Language:Python741215
  • chat-to-database-chatbot

    garyzava/chat-to-database-chatbot

    Chat to your Database GenAI Chatbot

    Language:Jupyter Notebook36106
  • ghodsizadeh/pdf2csv

    A python library and CLI tool to convert PDF files to CSV files.

    Language:Python32111
  • multi-agent-system

    versionHQ/multi-agent-system

    Autonomous agent networks for task automation that requires multi-step reasoning

    Language:Python26213
  • docling-project/docling4j

    Docling4j brings the functionalities of Docling in document understanding to Java® projects

    Language:Java16201
  • Slayer412/docling-bedrock-plugin

    Integrates AWS Bedrock's multimodal capabilities (Claude 3) into the Docling framework for generating image descriptions within document processing pipelines.

    Language:Python11
  • ya0002/obsidian-assist

    Make Zettelkasten-style note-taking the foundation of interactions with Large Language Models (LLMs).

    Language:Python10
  • felixdittrich92/docling-OCR-OnnxTR

    OnnxTR OCR plugin for Docling

    Language:Python9
  • quarkiverse/quarkus-docling

    Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem

    Language:Java9
  • HaileyTQuach/docchat-docling

    DocChat is an AI-powered Multi-Agent RAG system using Docling for structured document parsing and BM25 + vector search retrievers to retrieve fact-checked answers from PDFs, DOCX, and text files, preventing hallucinations. 🚀

    Language:Python71012
  • btwld/docling-sdk

    A TypeScript SDK for Docling - Bridge between the Python Docling ecosystem and JavaScript/TypeScript.

    Language:TypeScript5
  • aspose-cells-python/aspose-cells-python

    High-performance Python Excel processing library with advanced conversion capabilities

    Language:Python4
  • bisonbet/open-health

    OpenHealth, AI Health Assistant | Powered by Your Data

    Language:TypeScript40
  • NotYuSheng/OmniPDF

    OmniPDF is a PDF analyzer capable of translation, summarization, captioning and conversational capabilities through Retrieval-Augmented-Generation (RAG).

    Language:Python4
  • Rishang/deep-research

    Python SDK for Deep-Research

    Language:Python4
  • thevladdo/rag-backend

    Retrieval-Augmented Generation server with Pinecone and OpenAI

    Language:HTML40
  • ibm-granite-community/docling-workshop

    Source code for Docling Workshop

    Language:Jupyter Notebook3
  • ramona1999/Contract-Risk-Assessment

    This project is an AI-powered Contract Risk Assessment and Legal Assistant designed to analyze legal documents, extract key clauses, assess risks, and provide actionable recommendations. Additionally, a fine-tuned conversational chatbot is integrated for interactive legal Q&A based on contract-specific knowledge.

    Language:Jupyter Notebook310
  • serkanyasr/ntt_rag_project

    Scalable Agentic RAG system using Pydantic AI, FastAPI & pgvector. Modular, production-ready foundation for document-based AI apps

    Language:Python3
  • TM9657/docling-binary

    Docling Binary Server.

    Language:Python3
  • workloads/pathfinder-prism

    Pathfinder Prism - AI-powered Knowledge Base

    Language:HCL3
  • amirkiarafiei/docling-processor

    A Docling extension for superior PDF/DOCX to Markdown conversion, featuring smart image understanding with Gemini VLM.

    Language:Python2
  • Danitilahun/Document-processing-Pdf-Structured-Data-Extractor

    This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.

    Language:Jupyter Notebook2101
  • kwame-mintah/python-langchain-chainlit-qdrant-ollama-stack-template

    📄 A template for project for creating a chainlit application, using a locally run model via ollama and qdrant vector database for document retrieval.

    Language:Python210
  • ParthaPRay/docling_RAG_langchain_colab

    This repo contains codes for RAG using docling on colab notebook with langchain, milvus, huggingface embedding model and LLM

    Language:Jupyter Notebook210
  • ParthaPRay/gradio_docling_rag_langchain

    This repo provide RAG using Docling, langchain, milvus, sentence transformers, huggingface LLMs

    Language:Jupyter Notebook2100
  • shrimantasatpati/Document_Parser_using_AI

    Parse documents using AI - any document converted to markdown suitable for RAG applications

    Language:Jupyter Notebook210
  • AirtonLira/rag_rerank_n8n_minio

    Esta é uma stack completa para automação de workflows, processamento de documentos e busca semântica usando ferramentas open-source. O projeto integra n8n para automação, Supabase como backend com suporte a vetores, MinIO para armazenamento de objetos e Ollama para geração de embeddings locais.

    Language:Python1
  • hemanthkt/impactoverse-AI-mentor

    Developed an intelligent AI chatbot utilizing the DeepSeek LLM, designed for efficient interaction with large documents such as textbooks and study materials. Integrated Docling for parsing and processing large files, and implemented a Retrieval-Augmented Generation (RAG) pipeline using FAISS and Sentence Transformers to optimize context retrieval

    Language:JavaScript1
  • Jarus77/markdrop

    A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.

    Language:Python1
  • Myr-Ag-system

    Mil100057/Myr-Ag-system

    Local RAG system with Docling, LEANN, and Ollama - 97% storage efficient

    Language:Python1
  • PiMaV/Hybrid_Data_RAGfinery

    AS-IS hybrid RAG reference (ArangoDB + Qdrant + Docling + n8n + openwebui). Requires HCL Notes → JSON export. Archived; no maintenance.

    Language:Python1
  • shijincai/fast360

    The industry's first "Open Source OCR Arena," a free, no-login utility for one-click benchmarking of 7 top-tier models (Marker, MinerU, MonkeyOCR, Docling, Dolphin, OCRFlux, PP-StructureV3) on your PDF/image files, specializing in PDF-to-Markdown conversion.

  • vane/audiobook

    convert pdf document to audiobook cli

    Language:Python10