pdf-document-processor

There are 346 repositories under pdf-document-processor topic.

  • PDF2PPTGenerator

    PDF2PPT Generator is a Python tool that creates Powerpoint presentations from PDF files by using smart summarization techniques assisted by GPT-3.5-Turbo

    Language:Python21
  • react-pdftotext

    Light-weight memory-safe client library for extracting plain text from pdf files.

    Language:TypeScript20
  • Myanmar-Ebook-OCR

    This repo contains script using Tesseract OCR to digitize pdf ebooks to text format.

    Language:Python20
  • tcpdf

    persian and arabic fonts for TCPDF - PHP -فونت فارسی برای tcpdf

  • floorplan

    floorplan

    Evaluate various techniques to extract and organize information from a floor plan

    Language:HTML18
  • pdfcook

    Prepress preparing tool and PDF editor

    Language:C++18
  • PDF-Dark-Mode

    Converts PDF's to have a grey background to be easier on the eyes

    Language:Python18
  • Convert2Docx

    An R package for easily converting PDF to Word.

    Language:R17
  • PDF_Merge_and_Edit

    Python script to merge and edit sensitive PDF files you don't want to upload to random sites you find on Google

    Language:Python16
  • ScaleDP

    ScaleDP

    ScaleDP is an Open-Source extension of Apache Spark for Document Processing

    Language:Python15
  • CV-JD-Matching

    Extracting details from Resume(CVs) and matching with Job Description(JDs) using pretrained model like DistilBERT and ranking them using cosine similarity.

    Language:Jupyter Notebook15
  • Compress-PDFs

    A python CLI script to 𝗰𝗼𝗺𝗽𝗿𝗲𝘀𝘀 📦 all the 𝗣𝗗𝗙 files 𝗿𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲𝗹𝘆 in a directory using the iLovePDF technology 🥰

    Language:Python14
  • generate-insights-from-data-formats-with-watson

    How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.

    Language:Jupyter Notebook14
  • pdf-utils

    An android app to perform different operations on pdf files

    Language:Java13
  • PdfEditor

    PDF Editor (remove JS, find/replace, redact) based on iTextSharp

    Language:C#12
  • JPdfBookmarks

    jpdfbookmarks - fix JPdfBookmarks GUI mode open a pdf have bookmarks include CJK (Chinese , Japanese , Korean ) characters will show like tofu char (□) , add native installer ( msi , deb, rpm)

    Language:Java11
  • PDFConverter

    Converting pdf to any format for easily analyzing

    Language:Python11
  • pdf-forms-for-contact-form-7

    Build Contact Form 7 forms from PDF forms. Get PDFs auto-filled and attached to email messages and/or website responses on form submission.

    Language:PHP11
  • refinedoc

    Python library for extracting headers, footers and body from PDF

    Language:Python10
  • PDFGadget

    Batch PDF operations for Swift

    Language:Swift10
  • burdoc

    burdoc

    Advanced PDF parsing for python

    Language:HTML10
  • cv-data-extractor

    Extract essential data (e.g. GPA, skills, education, age, ...) from PDF-formatted working Resume files (under develop)

    Language:Python10
  • DynaMyTranslate

    PDF文献翻译(结果为Markdown形式)支持多语言互翻

    Language:TypeScript9
  • RAG-PDF-CHATBOT

    RAG-PDF-CHATBOT

    (PDF) Information and Inference, Retrieval-Augmented Generation [ RAG ]

    Language:Python9
  • pdfLayoutDet

    pdfDet aims to simplify PDF layout detect tasks for users.

    Language:Python9
  • chat_with_pdf_table

    The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

    Language:Jupyter Notebook9
  • AI-DocumentQnA

    Simple LLM-enabled document Q&A app built using Langchain and Streamlit

    Language:Python9
  • cambridge_core_downloader

    Download and merge PDFs from Cambridge Core

    Language:Python9
  • Formal-stack-pdfs

    Formal-stack-pdfs

    Make pdf from image , markdown and more is coming...

    Language:HTML9
  • PDF-Zensor

    PDF-Zensor can be used to censor PDF-files. As such it strips annotations and metadata as well as textual and graphical content from the PDF-file. It can also partially censor PDF-files and highlight certain text phrases.

    Language:Java9
  • PDFSDK

    Based on Foxit Quick PDF Library,python interface

    Language:Python9
  • PDF-Player

    Simple python utilities to play around with PDF Files

    Language:Python9
  • PDF-Manipulator

    A comprehensive PDF tool that allows you to effortlessly edit, merge, split, compress, and convert PDFs. It supports adding pages, extracting images, and viewing PDFs directly within the app. With a user-friendly drag-and-drop interface, it’s fully responsive across all devices, streamlining document management for everyone.

    Language:JavaScript8
  • sign-pdf-with-transparent-background-signature

    Sign PDF. Extract signature from a picture and sign the transparent-background signature to a PDF.

    Language:Python8
  • automatic-pancake

    Active learning agent-based-simulation for systematic reviews and other types of technology assisted review (TAR) which will include PDF documents and other meta-datas in itself and it's based on both fulltext-screening decisions and title-screening decisions.

    Language:Python8
  • pdfix_sdk_builds

    PDFix SDK release builds