pymupdf
There are 186 repositories under pymupdf topic.
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
ArtifexSoftware/pdf2docx
Open source Python library for converting PDF to DOCX.
CBIhalsen/PolyglotPDF
(eBook,PDFs Translation) A multilingual eBook processing tool supporting all eBook formats. Features online and offline translation while preserving original layouts. Compatible with both scanned and digital PDFs. Elegant user interface. The world's highest-performing open-source layout-preserving eBook translator.
Krasjet/pdf.tocgen
A CLI toolset to generate table of contents for PDF files automatically.
pymupdf/PyMuPDF-Utilities
Demos, examples and utilities using PyMuPDF
lucasrla/remarks
Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG
genieincodebottle/parsemypdf
Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.
vb64/markdown-pdf
Markdown to pdf renderer
Zain-Bin-Arshad/pdf-viewer
A Pure Python PDFViewer, which provides functionalities same as other famous PDFViewers.
devxzh/PDFTools
基于pyqt5, pymupdf实现的批量添加目录书签,增强pdf,拆分合并pdf的小工具
shayanalibhatti/Designing-a-PDF-Audiobook-using-Python
In this code, a simple implementation of PDF to audio converter is shown
benitomartin/multimodal-llm-pymupdf4llm
Multimodal RAG with PyMuPDF
seehiong/pdfusion
A powerful PDF processing engine that deconstructs documents into their core elements—text, images, and tables—and seamlessly reconstructs them into pristine, structured Markdown. Built with a React frontend and a robust Python (PyMuPDF) backend on Appwrite.
TheWatcherMultiversal/pdfgui_tools
pdfgui_tools is a user interface tool developed in Qt and Python that integrates with poppler-utils and PyPDF2 for PDF document management. It's a simple and user-friendly tool that includes various utilities.
xxao/pero
Unified Python drawing API
pymupdf/PyMuPDF-Optional-Material
Help file downloads, early ZIP binaries, wheels for retired Python 2.7, 3.5.
legout/veritascribe
AI-Powered Thesis Review Tool
stroblme/UNote
Fills the lack of an open-source PDF Editor with the capability to draw and add notes
NhanPhamThanh-IT/Scan-PDF-Paper
Advanced document analysis platform that extracts text from PDF, DOCX, and TXT files with AI-powered topic classification using Sentence Transformers. Features keyword matching, real-time analysis, interactive Streamlit web interface, and multi-topic support.
lheredias/Luftmensch
Useful PDF-related productivity tool.
gautam132002/invoice-pdf-data-extraction
Automated extraction of specific information from invoices, achieving over 95% accuracy.
politikundbildung/kindle_to_pdf
Creates PDF annotations from Kindle clippings
DioCrafts/ai-book-summarizer
📚 AI-Powered Book EPUB Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study
AhmedTrb/PDF_highlight_extractor
A python application built with PySide6 and PyMuPDF that extracts highlighted text from PDF files and categorizes then based on the color, allowing users to save and organize highlighted content in a markdown file.
kezb90/PDF_To_Word
A Python-based tool that converts PDF files into editable Word documents, preserving text, images, and layout. Uses PyPDF2, PyMuPDF (fitz), python-docx, and Pillow to accurately transfer content from PDF to .docx. Ideal for transforming complex PDFs into Word format for easy editing.
myogpatterns/layered-pdf-merge
Merges multiple PDFs into a combined PDF file respecting layers aka Optional Content Group
renan-siqueira/python-pdf-tool
This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.
yte9pc/Internet-Archive-PDF-Capstone
UVA Data Science Capstone project for Internet Archive. This project aimed to classify PDFs as research or non-research documents using an image and text-based approach. For the image-based models, we leveraged CNN transfer learning and used XGBoost for text-based approach.
jfriedlein/h2aFreeplane_pdf-highlightedText_to_Freeplane_synch
Freeplane script to organise highlighted text and notes from pdf files as Freeplane mindmap
elias-jhsph/scienceai
An AI-powered scientific literature search engine that uses OpenAI's language models to analyze research papers. It enables users to extract data, ask complex questions, and perform ad hoc literature reviews, handling hundreds of papers simultaneously without needing metadata.
Pancham1603/discord-pdf
View pdf files in discord text channels without downloading
pawankumar94/graphscribe-table-extractor
Graphscribe is an intelligent, LLM-powered document understanding system designed to extract structured insights from complex visual content such as statistical diagrams, charts, and graphs.
towfique-elahe/pdf-compressor
A Python tool to compress PDF files by downscaling and compressing embedded images. Uses PyMuPDF and Pillow to optimize file size, making PDFs easier to store and share while preserving layout and quality.
vickypandey14/Convert-PDF-into-Image-By-Python
This Python script converts each page of a PDF document into separate image files. It utilizes the PyMuPDF library (fitz) to handle PDF operations and the Python Imaging Library (PIL) for image processing.
alagarsamy-m/Document-Responder-Multi-AI-Agent
Document Responder - Multi Agent AI - Powered By OpenSource LLM (Llama3 - 8b)
mohandshamada/RAG-MCP
Rag MCP server for ai documentation