pdf-document-processor

There are 247 repositories under pdf-document-processor topic.

  • wmjordan/PDFPatcher

    PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等

    Language:C#9.1k971721.2k
  • pdf2htmlEX/pdf2htmlEX

    Convert PDF to HTML without losing text or format.

    Language:HTML3.7k56136375
  • qpdf/qpdf

    qpdf: A content-preserving PDF document transformer

    Language:C++3.4k69719269
  • run-llama/llama_parse

    Parse files for optimal RAG

    Language:Python2.7k22279260
  • unidoc/unipdf

    Golang PDF library for creating and processing PDF files (pure go)

    Language:Go2.6k31314251
  • UglyToad/PdfPig

    Read and extract text and other content from PDFs in C# (port of PDFBox)

    Language:C#1.7k47466238
  • GowenGit/docnet

    DocNET is as fast PDF editing and reading library for modern .NET applications

    Language:C#455246388
  • abarker/pdfCropMargins

    pdfCropMargins -- a program to crop the margins of PDF files

    Language:Python35275832
  • sailist/chatgpt-enhancement-extension

    An all-in-one plugin to improve your ChatGPT experience!

    Language:TypeScript33051128
  • hellerbarde/stapler

    A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk

    Language:Python28395653
  • michaelrsweet/pdfio

    PDFio is a simple C library for reading and writing PDF files.

    Language:C185105944
  • Dtronix/PDFiumCore

    .NET Standard P/Invoke bindings for PDFium.

    Language:C#14171722
  • houking-can/CCKS2019-Task5

    CCKS2019评测任务五-公众公司公告信息抽取,第3名

    Language:Python1233226
  • svenssonaxel/pdf-sign

    A tool to sign PDF files. With Linux support.

    Language:Python1216133
  • uroesch/pdftools

    A collection of PDF command line tools and wrappers for Linux

    Language:Shell87734
  • naiveHobo/pdfviewer

    PDFViewer is a GUI tool, written using python3 and tkinter, which lets you view PDF documents.

    Language:Python752427
  • lovasoa/pagelabels-py

    Python library to manipulate PDF page labels

    Language:Python6741012
  • OnedocLabs/onedoc

    The first developer-oriented document platform. Generate, host and track PDFs with a single API, beautifully.

    Language:Python63110
  • GURPREETKAURJETHRA/Multi-PDFs_ChatApp_AI-Agent

    Meet MultiPDF 📚 Chat AI App! 🚀 Chat seamlessly with Multiple PDFs using Langchain, Google Gemini Pro & FAISS Vector DB with Seamless Streamlit Deployment. Get instant, accurate responses from Awesome Google Gemini OpenSource language Model. 📚💬 Transform your PDF experience now! 🔥✨

    Language:Python563334
  • Auto-Research

    sidphbot/Auto-Research

    Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

    Language:Python54126
  • pankajr141/pdf2jpg

    Utility to convert PDF into JPG files

    Language:Java5151421
  • KalyanM45/DocGenius-Revolutionizing-PDFs-with-AI

    This is a Python application that allows you to load a PDF and ask questions about it using natural language. The application uses a LLM to generate a response about your PDF. The LLM will not answer questions unrelated to the document.

    Language:Python43235
  • praj2408/Realtime-Document-Chat-System

    In this project, we used Langchain to create a ChatGPT for your PDF using Streamlit. We built an application that allows you to ask questions about a PDF document and get answers directly from an LLM (Large Language Model), like OpenAI's ChatGPT.

    Language:Jupyter Notebook433313
  • papercast-dev/papercast

    A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

    Language:Python42191
  • SiddhantSadangi/pdf-workdesk

    A Streamlit-powered application that provides a user-friendly interface for editing PDF documents.

    Language:Python42127
  • backup-utils

    taseikyo/backup-utils

    :sparkles: A batch of useful code/scripts: run commands automatically, finish repetitive stupid operations, perform format conversions, etc.

    Language:Python311115
  • opendocument-app/pdf2htmlEX-Android

    pdf2htmlEX library port for Android - Convert PDF to HTML without losing text or format

    Language:Java2963911
  • sfneal/pdfconduit

    Prepare documents for distribution

    Language:Python241471
  • BobLd/PdfPigMLNetBlockClassifier

    Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

    Language:C#23406
  • datalogics/pdf-rest-api-samples

    pdfRest API Toolkit is a REST API service for processing PDF documents, made by developers, for developers. Rapidly integrate PDF workflows with your existing projects and applications, simply and seamlessly. Get started for free in seconds.

    Language:Java238110
  • eiceblue/Spire.PDF-for-Java

    Spire.PDF for Java is a PDF component that enables to read, write, print and convert PDF documents in Java applications without using Adobe Acrobat.

  • ayushwattal/PDF-ChatGpt

    Create a ChatGPT for uploaded pdf using Langchain

    Language:Python21119
  • ptyadana/Python-Projects-Dojo

    Collections of python projects including machine learning projects, image and pdf processing, password checkers, sending emails, sms, web scraping,flask web app,selenium automation testing,etc

    Language:Jupyter Notebook213016
  • akoweb/tcpdf

    persian and arabic fonts for TCPDF - PHP -فونت فارسی برای tcpdf

  • pdflexer/pdflexer

    .net pdf parsing library

    Language:C#183451