pdf-parsing

There are 93 repositories under pdf-parsing topic.

py-pdf/pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Language:Python9.6k 143 1.3k1.5k
jsvine/pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Language:Python9.1k 101 602825
galkahana/HummusJS
Node.js module for high performance creation, modification and parsing of PDF files and streams
Language:C1.2k 31 414169
adithya-s-k/marker-api
Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.
Language:Python922 6 24116
drmingler/docling-api
Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.
Language:Python722 6 1676
jstockwin/py-pdf-parser
A Python tool to help extracting information from structured PDFs.
Language:Python422 7 7150
chunyenHuang/hummusRecipe
A powerful PDF tool for NodeJS based on HummusJS.
Language:JavaScript349 8 16590
thoqbk/traprange
(Java)A Method to Extract Tabular Content from PDF Files
Language:HTML335 33 18132
ck-unifr/pdf_parsing
PDF解析（文字，章节，表格，图片，参考），基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答，摘要，信息抽取
Language:Python212 2 433
Unsiloed-AI/Unsiloed-Parser
Language:Python15742
ScientaNL/pdf-extractor
Node.js module for rendering pdf pages to images, svgs, html files, text files and json metadata
Language:JavaScript104 8 722
iamarunbrahma/pdf-to-markdown
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
Language:Python101 3 49
rostrovsky/pdf-table
Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV
Language:Java80 7 314
hellpanderrr/linkedin-pdf-parsing
Parsing resumes in a PDF format from linkedIn
Language:Python68 7 530
tuffstuff9/nextjs-pdf-parser
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
Language:TypeScript65 1 29
dipietrantonio/pdf4py
A PDF parser written in Python 3 with no external dependencies.
Language:Python58 5 03
abdullahshafiq-20/ResumeTex
ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaTeX syntax.
Language:JavaScript416
DQ-Zhang/refchaser
Written in python, for checking reference lists in systematic reviews and literature reviews, helps with reference list searching both backward&forward by extracting references and creating search queries, ranks articles by relevance to improve screening efficiency, download full-text pdf of research articles in batch.
Language:Python24 1 03
adrienjoly/npm-pdfreader-example
Example of use of pdfreader: parse a PDF résumé
Language:JavaScript16 1 211
malice-plugins/pdf
Malice PDF Plugin
Language:Python16 5 211
diandiancha/LittleAIBox
A privacy-focused AI chat platform built with Vite + Capacitor + Cloudflare. Runs locally or in the cloud, supports Gemini models, live web search, and intelligent key rotation. No sign-up required — fast, secure, and customizable.
Language:JavaScript15
aimaster-dev/chatbot-using-rag-and-langchain
Chat with your PDFs using AI! This Streamlit app uses RAG, LangChain, FAISS, and OpenAI to let you ask questions and get answers with page and file references.
Language:Python13 1 00
IQDM/IQDM-PDF
A collection of PDF data mining scripts for various IMRT QA vendors
Language:Python13 0 52
meldonization/depdf
An ultimate pdf file disintegration tool
Language:Python11 1 20
ishaangupta-YB/nextjs-pdf-parser
Next.js template for seamless PDF parsing using pdf2json and custom drag nd drop file-uploader. Ideal for developers seeking a ready-to-use solution for PDF content extraction in their Next.js projects.
Language:TypeScript10 1 05
Remus-Hack-n-Roll-2019/job-matcher
Upload your resume and check out your best matching jobs!
Language:Python10 0 25
easonlai/chat_with_pdf_table
The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.
Language:Jupyter Notebook9 1 14
anandubajith/nitc-hostel-dues
Hostel dues retriever of NIT Calicut
Language:HTML8 1 05
co-dev0909/chatbot-using-rag-and-langchain
Chat with your PDFs using AI! This Streamlit app uses RAG, LangChain, FAISS, and OpenAI to let you ask questions and get answers with page and file references.
Language:Python8 0 0
jaychempan/PDF-Master
PDF-Master: A Comprehensive Pipeline for PDF Parsing with Large Language Models (LLMs), 一个综合的大模型PDF文档的解析流程 ✨
Language:Python80
NahomAl/ethiobank_receipts
Fast and reliable Python library to extract and verify payment receipts from major Ethiopian banks (CBE, Dashen, Awash, BOA, Zemen, Telebirr).
Language:Python71
bkawan/pdf-parser
Language:Python5 1 00
J-sephB-lt-n/pdf-bank-statement-parser
Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data
Language:Python5 1 14
sjsreehari/pragati
An AI-powered platform for automated Detailed Project Report (DPR) analysis, combining XGBoost-based feasibility classification, MDONER/NEC compliance validation, OCR text extraction, and an interactive React dashboard to deliver transparent, real-time project evaluation and risk assessment.
Language:Python5
sunnybedi990/RAG-with-LLM
"A Retrieval-Augmented Generation (RAG) system for document query and summarization using vector-based search and language models.
Language:Python4 1 01
yep-yogesh/Dhruva.AI
Dhruva.ai - Your guiding star in the academic universe. A multilingual campus chatbot that answers student queries anytime, anywhere.
Language:JavaScript4

pdf-parsing

py-pdf/pypdf

jsvine/pdfplumber

galkahana/HummusJS

adithya-s-k/marker-api

drmingler/docling-api

jstockwin/py-pdf-parser

chunyenHuang/hummusRecipe

thoqbk/traprange

ck-unifr/pdf_parsing

Unsiloed-AI/Unsiloed-Parser

ScientaNL/pdf-extractor

iamarunbrahma/pdf-to-markdown

rostrovsky/pdf-table

hellpanderrr/linkedin-pdf-parsing

tuffstuff9/nextjs-pdf-parser

dipietrantonio/pdf4py

abdullahshafiq-20/ResumeTex

DQ-Zhang/refchaser

adrienjoly/npm-pdfreader-example

malice-plugins/pdf

diandiancha/LittleAIBox

aimaster-dev/chatbot-using-rag-and-langchain

IQDM/IQDM-PDF

meldonization/depdf

ishaangupta-YB/nextjs-pdf-parser

Remus-Hack-n-Roll-2019/job-matcher

easonlai/chat_with_pdf_table

anandubajith/nitc-hostel-dues

co-dev0909/chatbot-using-rag-and-langchain

jaychempan/PDF-Master

NahomAl/ethiobank_receipts

bkawan/pdf-parser

J-sephB-lt-n/pdf-bank-statement-parser

sjsreehari/pragati

sunnybedi990/RAG-with-LLM

yep-yogesh/Dhruva.AI