pdf-document-processor
There are 346 repositories under pdf-document-processor topic.
PDF2PPTGenerator
PDF2PPT Generator is a Python tool that creates Powerpoint presentations from PDF files by using smart summarization techniques assisted by GPT-3.5-Turbo
react-pdftotext
Light-weight memory-safe client library for extracting plain text from pdf files.
Myanmar-Ebook-OCR
This repo contains script using Tesseract OCR to digitize pdf ebooks to text format.
tcpdf
persian and arabic fonts for TCPDF - PHP -فونت فارسی برای tcpdf
floorplan
Evaluate various techniques to extract and organize information from a floor plan
pdfcook
Prepress preparing tool and PDF editor
PDF-Dark-Mode
Converts PDF's to have a grey background to be easier on the eyes
Convert2Docx
An R package for easily converting PDF to Word.
PDF_Merge_and_Edit
Python script to merge and edit sensitive PDF files you don't want to upload to random sites you find on Google
ScaleDP
ScaleDP is an Open-Source extension of Apache Spark for Document Processing
CV-JD-Matching
Extracting details from Resume(CVs) and matching with Job Description(JDs) using pretrained model like DistilBERT and ranking them using cosine similarity.
Compress-PDFs
A python CLI script to 𝗰𝗼𝗺𝗽𝗿𝗲𝘀𝘀 📦 all the 𝗣𝗗𝗙 files 𝗿𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲𝗹𝘆 in a directory using the iLovePDF technology 🥰
generate-insights-from-data-formats-with-watson
How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.
pdf-utils
An android app to perform different operations on pdf files
PdfEditor
PDF Editor (remove JS, find/replace, redact) based on iTextSharp
JPdfBookmarks
jpdfbookmarks - fix JPdfBookmarks GUI mode open a pdf have bookmarks include CJK (Chinese , Japanese , Korean ) characters will show like tofu char (□) , add native installer ( msi , deb, rpm)
PDFConverter
Converting pdf to any format for easily analyzing
pdf-forms-for-contact-form-7
Build Contact Form 7 forms from PDF forms. Get PDFs auto-filled and attached to email messages and/or website responses on form submission.
refinedoc
Python library for extracting headers, footers and body from PDF
PDFGadget
Batch PDF operations for Swift
burdoc
Advanced PDF parsing for python
cv-data-extractor
Extract essential data (e.g. GPA, skills, education, age, ...) from PDF-formatted working Resume files (under develop)
DynaMyTranslate
PDF文献翻译(结果为Markdown形式)支持多语言互翻
RAG-PDF-CHATBOT
(PDF) Information and Inference, Retrieval-Augmented Generation [ RAG ]
pdfLayoutDet
pdfDet aims to simplify PDF layout detect tasks for users.
chat_with_pdf_table
The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.
AI-DocumentQnA
Simple LLM-enabled document Q&A app built using Langchain and Streamlit
cambridge_core_downloader
Download and merge PDFs from Cambridge Core
Formal-stack-pdfs
Make pdf from image , markdown and more is coming...
PDF-Zensor
PDF-Zensor can be used to censor PDF-files. As such it strips annotations and metadata as well as textual and graphical content from the PDF-file. It can also partially censor PDF-files and highlight certain text phrases.
PDFSDK
Based on Foxit Quick PDF Library,python interface
PDF-Player
Simple python utilities to play around with PDF Files
PDF-Manipulator
A comprehensive PDF tool that allows you to effortlessly edit, merge, split, compress, and convert PDFs. It supports adding pages, extracting images, and viewing PDFs directly within the app. With a user-friendly drag-and-drop interface, it’s fully responsive across all devices, streamlining document management for everyone.
sign-pdf-with-transparent-background-signature
Sign PDF. Extract signature from a picture and sign the transparent-background signature to a PDF.
automatic-pancake
Active learning agent-based-simulation for systematic reviews and other types of technology assisted review (TAR) which will include PDF documents and other meta-datas in itself and it's based on both fulltext-screening decisions and title-screening decisions.
pdfix_sdk_builds
PDFix SDK release builds