document-parser

There are 42 repositories under document-parser topic.

  • infiniflow/ragflow

    RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

    Language:TypeScript64.3k2935.2k6.7k
  • docling

    docling-project/docling

    Get your documents ready for gen AI

    Language:Python38.7k1651.2k2.7k
  • Unstructured-IO/unstructured

    Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

    Language:HTML12.7k681.2k1k
  • AutoRAG

    Marker-Inc-Korea/AutoRAG

    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

    Language:Python4.3k33627338
  • run-llama/llama_cloud_services

    Knowledge Agents and Management in the Cloud

    Language:TypeScript4.1k26455449
  • open-parse

    Filimoa/open-parse

    Improved file parsing for LLM’s

    Language:Python3.1k2346130
  • deepdoctection/deepdoctection

    A Repo For Document AI

    Language:Python3k20193169
  • liweiphys/layra

    LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual multi-step agent workflow orchestration.

    Language:TypeScript808
  • iamarunbrahma/vision-parse

    Parse PDFs into markdown using Vision LLMs

    Language:Python42841759
  • marieai/marie-ai

    Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing

    Language:Python7341057
  • papercast-dev/papercast

    A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

    Language:Python52191
  • JPLeoRX/opencv-text-deskew

    Tutorial on how to deskew (straighten) text images

    Language:Python513216
  • Invoiceable

    InvoiceableAI/Invoiceable

    The invoice, document, and resume parser powered by AI.

    Language:Python31133
  • urbanclap-engg/smart-docs-parser

    An OCR based document parser to extract information from identity document images

    Language:TypeScript21237
  • decisionfacts/semantic-ai

    An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).

    Language:Python20201
  • brazilian-code/Resume_Parsing

    Resume Parsing app to extract information using AI

    Language:Jupyter Notebook17019
  • graphlit/graphlit

    Graphlit Platform

  • docling-project/docling4j

    Docling4j brings the functionalities of Docling in document understanding to Java® projects

    Language:Java161
  • decisionfacts/df-extract

    DF Extract Lib

    Language:Python14100
  • Clearedge-AI/clearedge

    Build a RAG preprocessing pipeline

    Language:Jupyter Notebook12200
  • graphlit/graphlit-client-python

    Python client library for Graphlit Platform

    Language:Python11202
  • has-abi/docparser

    Extract text from your DOCX documents.

    Language:Python10102
  • Gyanvir/DrParser

    Dr.Parser 🩸📊 – AI-powered blood report parser that extracts and analyzes medical data from images/PDFs. Built with React, FastAPI, EasyOCR, and Gemini AI. 🚀 🔹 Local Setup Available | 🔹 Future Enhancements Planned | 🔹 Hackathon Project 👉 Clone, run, and explore the future of AI-driven healthcare!

    Language:Python4100
  • hrbrmstr/docparser

    🧰 Tools to Upload/Parse Documents to 'docparser' and Retrieve Extracted Results

    Language:R430
  • coderosh/docpa

    A simple library that I use for web scraping. Uses htmlparser2 to parse dom.

    Language:TypeScript310
  • lorenzbr/techStandards

    Download and parse technical standard documents

    Language:R2100
  • shrimantasatpati/Document_Parser_using_AI

    Parse documents using AI - any document converted to markdown suitable for RAG applications

    Language:Jupyter Notebook210
  • agent87/IhuguraChatBotUX

    Ihugure Chatbot Streamlit User Interface

    Language:Python1200
  • anyparser/anyparser_crewai

    Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.

    Language:Python1000
  • cr4yfish/docling-js

    Parsing Documents to one datatype (Typescript port of Docling)

  • graphlit/graphlit-client-typescript

    TypeScript client for Graphlit Platform

    Language:TypeScript1100
  • RevanKumarD/LlaMarker

    Your ultimate tool for effortlessly converting and parsing documents into clean, well-structured Markdown—fast, reliable, and 100% local! 💻✨

    Language:Python10
  • MaineDSA/voter_participation_extractor_portland

    The City of Portland distributes voter participation info in PDF format. This makes it a CSV.

    Language:Python0100
  • MidHunterX/Scholar-CAP

    🎓 Set of powerful tools designed to streamline the extraction, parsing, and clean-up of data from docx and pdf forms. Saves time and eliminate manual data entry by automating the processing of structured data.

    Language:Python0130
  • AkandindaJunior/Cloud-Services

    If it’s not documented, it never happened. 📝 Please check my README.md for more details. 🔍

  • atbasu/document-content-extractor

    Python program that uses open ai apis to parse user specified content from text files

    Language:Python112