document-parser
There are 30 repositories under document-parser topic.
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
deepdoctection/deepdoctection
A Repo For Document AI
Filimoa/open-parse
Improved file parsing for LLM’s
marieai/marie-ai
Integrate AI-powered Document Analysis Pipelines
JPLeoRX/opencv-text-deskew
Tutorial on how to deskew (straighten) text images
papercast-dev/papercast
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.
urbanclap-engg/smart-docs-parser
An OCR based document parser to extract information from identity document images
decisionfacts/semantic-ai
An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).
InvoiceableAI/Invoiceable
The invoice, document, and résumé parser powered by AI.
decisionfacts/df-extract
DF Extract Lib
brazilian-code/Resume_Parsing
Resume Parsing app to extract information using AI
Clearedge-AI/clearedge
Build a RAG preprocessing pipeline
has-abi/docparser
Extract text from your DOCX documents.
graphlit/graphlit
Graphlit Platform
hrbrmstr/docparser
🧰 Tools to Upload/Parse Documents to 'docparser' and Retrieve Extracted Results
graphlit/graphlit-client-python
Python client library for Graphlit Platform
coderosh/docpa
A simple library that I use for web scraping. Uses htmlparser2 to parse dom.
lorenzbr/techStandards
Download and parse technical standard documents
agent87/IhuguraChatBotUX
Ihugure Chatbot Streamlit User Interface
dills122/ShamWow
Who likes lawyers? Me either; scrub your PII with ShamWow
graphlit/graphlit-client-typescript
TypeScript client for Graphlit Platform
MaineDSA/voter_participation_extractor_portland
The City of Portland distributes voter participation info in PDF format. This makes it a CSV.
MidHunterX/Scholar-CAP
🎓 Set of powerful tools designed to streamline the extraction, parsing, and clean-up of data from docx and pdf forms. Saves time and eliminate manual data entry by automating the processing of structured data.
munenepeter/Case-Law-Search
A Simple Case parser and search
munenepeter/translate
A simple document uploader & parser
atbasu/document-content-extractor
Python program that uses open ai apis to parse user specified content from text files
buren/document_parser
Small Rails API app to parse documents.
JayLohokare/docX
Convert documents into Quizes! Built at HackNY (Android + NodeJS + Alexa skill)
JayLohokare/docX-REST-API
Shubham's REST APIs made at hackNY