document-parsing

There are 26 repositories under document-parsing topic.

  • docling

    docling-project/docling

    Get your documents ready for gen AI

    Language:Python38.7k1651.2k2.7k
  • Unstructured-IO/unstructured

    Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

    Language:HTML12.7k681.2k1k
  • run-llama/llama_cloud_services

    Knowledge Agents and Management in the Cloud

    Language:TypeScript4.1k26455449
  • ExtractThinker

    enoch3712/ExtractThinker

    ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

    Language:Python1.4k20147137
  • edenai/edenai-apis

    Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines

    Language:Python45781467
  • harishdeivanayagam/rowfill

    Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers

    Language:TypeScript3638414
  • papercast-dev/papercast

    A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

    Language:Python52191
  • CycloneBoy/pdf_table

    A Unified Toolkit for Deep Learning-Based Table Extraction

    Language:Python49542
  • Unstructured-IO/community

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

  • docling-project/docling4j

    Docling4j brings the functionalities of Docling in document understanding to Java® projects

    Language:Java161
  • J-sephB-lt-n/pdf-bank-statement-parser

    Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

    Language:Python5104
  • ziming/laravel-docparser

    Docparser OCR Package for PHP Laravel

    Language:PHP3100
  • ats

    acenji/ats

    Applicant Tracking System (ATS): A powerful platform leveraging generative AI and soft-match algorithms to analyze resumes against job descriptions. Built with React and Node.js, it streamlines hiring insights. Future plans include expanding to investor pitches and other structured documents.

    Language:JavaScript2103
  • ajaycode/unstructured

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    Language:HTML2000
  • rithulkamesh/docproc

    Opinionated and Sophisticated Document Region Analyzer.

    Language:Python2170
  • anyparser/anyparser_crewai

    Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.

    Language:Python1000
  • augustweinbren/PhraseSpeaker

    PhraseSpeaker: Effortlessly dictate specific sections of text files with macOS's text-to-speech. Perfect for navigating and audibly extracting key content from large documents!

    Language:Shell1100
  • baughmann/tikara

    The metadata and text content extractor for almost every file type.

    Language:Python1190
  • cr4yfish/docling-js

    Parsing Documents to one datatype (Typescript port of Docling)

  • dsidlo/pyreparse

    Data Structure and Class to ease Parsing of Complex Documents.

    Language:Python0100
  • imnotamr/English-to-French-app-using-STREAMLIT-

    An interactive Streamlit app that translates English text and documents to French, featuring Google Translate API integration and text-to-speech functionality. Includes PDF and Word document translation.

    Language:Python00
  • karthik-monkey/quantgpt

    AI-powered Financial Report Analysis Engine

    Language:Python0200
  • kevv1m/tikara

    The metadata and text content extractor for almost every file type.

    00
  • arnabd64/Amazon-Textract-Guide

    Google Colab to parse a Multipage Document using Amazon Textract asynchronously

    Language:Jupyter Notebook
  • azzubair01/zubairhub

    ZubairHub is a Streamlit-based application that integrates various functionalities, including social graph visualization, object detection, document parsing, text extraction, generative AI interaction, and personal data transformation.

    Language:Python1
  • qlfv/Docling-Testing

    Repository for testing and demonstrating the capabilities of Docling for document conversion.

    Language:HTML