pdf-document-processor

There are 346 repositories under pdf-document-processor topic.

PDF2PPTGenerator
PDF2PPT Generator is a Python tool that creates Powerpoint presentations from PDF files by using smart summarization techniques assisted by GPT-3.5-Turbo
Language:Python21
react-pdftotext
Light-weight memory-safe client library for extracting plain text from pdf files.
Language:TypeScript20
Myanmar-Ebook-OCR
This repo contains script using Tesseract OCR to digitize pdf ebooks to text format.
Language:Python20
tcpdf
persian and arabic fonts for TCPDF - PHP -فونت فارسی برای tcpdf
19
floorplan
Evaluate various techniques to extract and organize information from a floor plan
Language:HTML18
pdfcook
Prepress preparing tool and PDF editor
Language:C++18
PDF-Dark-Mode
Converts PDF's to have a grey background to be easier on the eyes
Language:Python18
Convert2Docx
An R package for easily converting PDF to Word.
Language:R17
PDF_Merge_and_Edit
Python script to merge and edit sensitive PDF files you don't want to upload to random sites you find on Google
Language:Python16
ScaleDP
ScaleDP is an Open-Source extension of Apache Spark for Document Processing
Language:Python15
CV-JD-Matching
Extracting details from Resume(CVs) and matching with Job Description(JDs) using pretrained model like DistilBERT and ranking them using cosine similarity.
Language:Jupyter Notebook15
Compress-PDFs
A python CLI script to 𝗰𝗼𝗺𝗽𝗿𝗲𝘀𝘀 📦 all the 𝗣𝗗𝗙 files 𝗿𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲𝗹𝘆 in a directory using the iLovePDF technology 🥰
Language:Python14
generate-insights-from-data-formats-with-watson
How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.
Language:Jupyter Notebook14
pdf-utils
An android app to perform different operations on pdf files
Language:Java13
PdfEditor
PDF Editor (remove JS, find/replace, redact) based on iTextSharp
Language:C#12
JPdfBookmarks
jpdfbookmarks - fix JPdfBookmarks GUI mode open a pdf have bookmarks include CJK (Chinese , Japanese , Korean ) characters will show like tofu char (□) , add native installer ( msi , deb, rpm)
Language:Java11
PDFConverter
Converting pdf to any format for easily analyzing
Language:Python11
pdf-forms-for-contact-form-7
Build Contact Form 7 forms from PDF forms. Get PDFs auto-filled and attached to email messages and/or website responses on form submission.
Language:PHP11
refinedoc
Python library for extracting headers, footers and body from PDF
Language:Python10
PDFGadget
Batch PDF operations for Swift
Language:Swift10
burdoc
Advanced PDF parsing for python
Language:HTML10
cv-data-extractor
Extract essential data (e.g. GPA, skills, education, age, ...) from PDF-formatted working Resume files (under develop)
Language:Python10
DynaMyTranslate
PDF文献翻译（结果为Markdown形式）支持多语言互翻
Language:TypeScript9
RAG-PDF-CHATBOT
(PDF) Information and Inference, Retrieval-Augmented Generation [ RAG ]
Language:Python9
pdfLayoutDet
pdfDet aims to simplify PDF layout detect tasks for users.
Language:Python9
chat_with_pdf_table
The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.
Language:Jupyter Notebook9
AI-DocumentQnA
Simple LLM-enabled document Q&A app built using Langchain and Streamlit
Language:Python9
cambridge_core_downloader
Download and merge PDFs from Cambridge Core
Language:Python9
Formal-stack-pdfs
Make pdf from image , markdown and more is coming...
Language:HTML9
PDF-Zensor
PDF-Zensor can be used to censor PDF-files. As such it strips annotations and metadata as well as textual and graphical content from the PDF-file. It can also partially censor PDF-files and highlight certain text phrases.
Language:Java9
PDFSDK
Based on Foxit Quick PDF Library，python interface
Language:Python9
PDF-Player
Simple python utilities to play around with PDF Files
Language:Python9
PDF-Manipulator
A comprehensive PDF tool that allows you to effortlessly edit, merge, split, compress, and convert PDFs. It supports adding pages, extracting images, and viewing PDFs directly within the app. With a user-friendly drag-and-drop interface, it’s fully responsive across all devices, streamlining document management for everyone.
Language:JavaScript8
sign-pdf-with-transparent-background-signature
Sign PDF. Extract signature from a picture and sign the transparent-background signature to a PDF.
Language:Python8
automatic-pancake
Active learning agent-based-simulation for systematic reviews and other types of technology assisted review (TAR) which will include PDF documents and other meta-datas in itself and it's based on both fulltext-screening decisions and title-screening decisions.
Language:Python8
pdfix_sdk_builds
PDFix SDK release builds
8