pymupdf

There are 186 repositories under pymupdf topic.

pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Language:Python8.4k 62 2.4k661
ArtifexSoftware/pdf2docx
Open source Python library for converting PDF to DOCX.
Language:Python3.2k 31 284459
CBIhalsen/PolyglotPDF
(eBook，PDFs Translation) A multilingual eBook processing tool supporting all eBook formats. Features online and offline translation while preserving original layouts. Compatible with both scanned and digital PDFs. Elegant user interface. The world's highest-performing open-source layout-preserving eBook translator.
Language:Python2.1k 217 11290
Krasjet/pdf.tocgen
A CLI toolset to generate table of contents for PDF files automatically.
Language:Python796 8 3327
pymupdf/PyMuPDF-Utilities
Demos, examples and utilities using PyMuPDF
Language:Jupyter Notebook686 6 59175
lucasrla/remarks
Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG
Language:Python385 9 4325
genieincodebottle/parsemypdf
Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.
Language:Python152 3 331
vb64/markdown-pdf
Markdown to pdf renderer
Language:Python118 3 2613
Zain-Bin-Arshad/pdf-viewer
A Pure Python PDFViewer, which provides functionalities same as other famous PDFViewers.
Language:Python85 4 720
devxzh/PDFTools
基于pyqt5, pymupdf实现的批量添加目录书签，增强pdf，拆分合并pdf的小工具
Language:Python63 0 210
shayanalibhatti/Designing-a-PDF-Audiobook-using-Python
In this code, a simple implementation of PDF to audio converter is shown
Language:Python50 1 123
benitomartin/multimodal-llm-pymupdf4llm
Multimodal RAG with PyMuPDF
Language:Jupyter Notebook42 1 09
seehiong/pdfusion
A powerful PDF processing engine that deconstructs documents into their core elements—text, images, and tables—and seamlessly reconstructs them into pristine, structured Markdown. Built with a React frontend and a robust Python (PyMuPDF) backend on Appwrite.
Language:Python37 0 00
TheWatcherMultiversal/pdfgui_tools
pdfgui_tools is a user interface tool developed in Qt and Python that integrates with poppler-utils and PyPDF2 for PDF document management. It's a simple and user-friendly tool that includes various utilities.
Language:Python37 2 12
xxao/pero
Unified Python drawing API
Language:Python36 3 24
pymupdf/PyMuPDF-Optional-Material
Help file downloads, early ZIP binaries, wheels for retired Python 2.7, 3.5.
17 8 35
legout/veritascribe
AI-Powered Thesis Review Tool
Language:Python162
stroblme/UNote
Fills the lack of an open-source PDF Editor with the capability to draw and add notes
Language:Python15 1 03
NhanPhamThanh-IT/Scan-PDF-Paper
Advanced document analysis platform that extracts text from PDF, DOCX, and TXT files with AI-powered topic classification using Sentence Transformers. Features keyword matching, real-time analysis, interactive Streamlit web interface, and multi-topic support.
Language:Python14 0 0
lheredias/Luftmensch
Useful PDF-related productivity tool.
Language:Python13 2 13
gautam132002/invoice-pdf-data-extraction
Automated extraction of specific information from invoices, achieving over 95% accuracy.
Language:Python11 1 01
politikundbildung/kindle_to_pdf
Creates PDF annotations from Kindle clippings
Language:Python10 1 00
DioCrafts/ai-book-summarizer
📚 AI-Powered Book EPUB Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study
Language:Python9 1 02
AhmedTrb/PDF_highlight_extractor
A python application built with PySide6 and PyMuPDF that extracts highlighted text from PDF files and categorizes then based on the color, allowing users to save and organize highlighted content in a markdown file.
Language:Python70
kezb90/PDF_To_Word
A Python-based tool that converts PDF files into editable Word documents, preserving text, images, and layout. Uses PyPDF2, PyMuPDF (fitz), python-docx, and Pillow to accurately transfer content from PDF to .docx. Ideal for transforming complex PDFs into Word format for easy editing.
Language:Python7 1 02
myogpatterns/layered-pdf-merge
Merges multiple PDFs into a combined PDF file respecting layers aka Optional Content Group
Language:Python6 2 02
renan-siqueira/python-pdf-tool
This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.
Language:Python6 1 01
yte9pc/Internet-Archive-PDF-Capstone
UVA Data Science Capstone project for Internet Archive. This project aimed to classify PDFs as research or non-research documents using an image and text-based approach. For the image-based models, we leveraged CNN transfer learning and used XGBoost for text-based approach.
Language:Jupyter Notebook6 1 02
jfriedlein/h2aFreeplane_pdf-highlightedText_to_Freeplane_synch
Freeplane script to organise highlighted text and notes from pdf files as Freeplane mindmap
Language:Tcl4 1 00
elias-jhsph/scienceai
An AI-powered scientific literature search engine that uses OpenAI's language models to analyze research papers. It enables users to extract data, ask complex questions, and perform ad hoc literature reviews, handling hundreds of papers simultaneously without needing metadata.
Language:Python3 2 00
Pancham1603/discord-pdf
View pdf files in discord text channels without downloading
Language:Python3 1 00
pawankumar94/graphscribe-table-extractor
Graphscribe is an intelligent, LLM-powered document understanding system designed to extract structured insights from complex visual content such as statistical diagrams, charts, and graphs.
Language:Python30
towfique-elahe/pdf-compressor
A Python tool to compress PDF files by downscaling and compressing embedded images. Uses PyMuPDF and Pillow to optimize file size, making PDFs easier to store and share while preserving layout and quality.
Language:Jupyter Notebook3 1 0
vickypandey14/Convert-PDF-into-Image-By-Python
This Python script converts each page of a PDF document into separate image files. It utilizes the PyMuPDF library (fitz) to handle PDF operations and the Python Imaging Library (PIL) for image processing.
Language:Python3 1 01
alagarsamy-m/Document-Responder-Multi-AI-Agent
Document Responder - Multi Agent AI - Powered By OpenSource LLM (Llama3 - 8b)
Language:Python2
mohandshamada/RAG-MCP
Rag MCP server for ai documentation
Language:Python2

pymupdf

pymupdf/PyMuPDF

ArtifexSoftware/pdf2docx

CBIhalsen/PolyglotPDF

Krasjet/pdf.tocgen

pymupdf/PyMuPDF-Utilities

lucasrla/remarks

genieincodebottle/parsemypdf

vb64/markdown-pdf

Zain-Bin-Arshad/pdf-viewer

devxzh/PDFTools

shayanalibhatti/Designing-a-PDF-Audiobook-using-Python

benitomartin/multimodal-llm-pymupdf4llm

seehiong/pdfusion

TheWatcherMultiversal/pdfgui_tools

xxao/pero

pymupdf/PyMuPDF-Optional-Material

legout/veritascribe

stroblme/UNote

NhanPhamThanh-IT/Scan-PDF-Paper

lheredias/Luftmensch

gautam132002/invoice-pdf-data-extraction

politikundbildung/kindle_to_pdf

DioCrafts/ai-book-summarizer

AhmedTrb/PDF_highlight_extractor

kezb90/PDF_To_Word

myogpatterns/layered-pdf-merge

renan-siqueira/python-pdf-tool

yte9pc/Internet-Archive-PDF-Capstone

jfriedlein/h2aFreeplane_pdf-highlightedText_to_Freeplane_synch

elias-jhsph/scienceai

Pancham1603/discord-pdf

pawankumar94/graphscribe-table-extractor

towfique-elahe/pdf-compressor

vickypandey14/Convert-PDF-into-Image-By-Python

alagarsamy-m/Document-Responder-Multi-AI-Agent

mohandshamada/RAG-MCP