pdf-to-text

There are 100 repositories under pdf-to-text topic.

  • pdf-to-text-python

    This code is designed to analyze a PDF document and determine the percentage of AI-generated content within the text. It utilizes the PyPDF2 library to extract the text from each page of the PDF and the NLTK library to check for AI-generated words.

    Language:Python4
  • story-to-video

    🎥 command-line Python tool that allows you to convert a PDF story into a video.

    Language:Python3
  • scanned-pdf-text-extractor

    This is a Python application that converts non-readable PDF files, such as scanned documents, into readable Word documents. It achieves this by first converting the PDF files into images and then extracting the text from the images to create the Word documents. The application provides a user-friendly interface to do the above task.

    Language:Python3
  • pdf-tools

    A collection of PDF tools to convert, merge, and compress PDFs. Free & No installation.

  • apdfl-csharp-dotnet-framework-samples

    Sample code for the Datalogics .NET Framework interface of the Adobe PDF Library

    Language:C#3
  • amane

    Amane, a simple free open source pdf to audiobook web app

    Language:HTML2
  • LeapRAG

    LeapRAG is an open-source platform that integrates advanced RAG technology with Google’s A2A protocol, enabling users to build context-aware, data-driven agents. These agents are automatically A2A-compliant and can be discovered and used by any compatible client without extra development.

    Language:Python2
  • MedXpert-FrontEnd

    MedXpert is an Android-based healthcare application that leverages OCR (Tesseract, pdfplumber) and LLMs (OpenAI GPT-3.5) to automate medical report extraction, abnormality detection, and natural language summarization. It features Firebase-powered user authentication, role-based access control, and real-time chatbot integration for medical queries.

    Language:Kotlin2
  • NOAA-Weather-Modification-Forms-LLM-Extractor

    Extract key information from 1,000s of NOAA Form 17-4 (Initial Report On Weather Modification Activities) using LLM.

    Language:Python2
  • arxiv_extractor

    This code can effectively convert PDF Research Papers to clean Text files, avoiding images and tables.

    Language:Python2
  • document-scanner

    Extract structured text and data from documents like invoices, book pages, tables, etc.. using OpenCV and Tesseract OCR

    Language:HTML2
  • unstructured

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    Language:HTML2
  • PDFBox-get-Coordinates-of-text

    This PDFBox wrapper that can be used for extracting text and text co-ordinates from a printed PDF doc (no OCR)

    Language:JavaScript2
  • pdf-tutorial

    C# demo for PDF to image converting, pdf text extracting, adding digital signature to pdf, adding watermark to pdf, and compressing pdf

    Language:C#2
  • tactica.llama.pdftotext.net

    Pdf to Text .NET transcriber CLI app using Ollama

    Language:C#1
  • fastapi-ocr

    FastAPI OCR service using tesseract or paddleOCR

    Language:Python1
  • MedXpert-Backend-FastAPI

    AI-powered medical report analyzer that extracts text from PDFs/images, summarizes reports, detects abnormalities, and provides a chatbot for medical queries. Built with FastAPI, OCR (Tesseract, pdfplumber), OpenAI GPT-3.5, and deployed on Google Cloud. Future enhancements include medical image classification and predictions. Contributions Welcome!

    Language:Python1
  • textnomnom-py

    Extract text from PDFs, PPTs, & URLs (with OCR support). Converts PPT to PDF & handles files or folders. 🦍

    Language:Python1
  • docling-js

    Parsing Documents to one datatype (Typescript port of Docling) (NOT STARTED!)

  • pdf-parser

    A PDF parser application built with Next.js that extracts and displays text from uploaded PDF files. Provides user feedback through a loading indicator and error handling for improved user experience.

    Language:TypeScript1
  • apdfl-vb-dotnet-samples

    Adobe PDF Library Samples in Visual Basic for .NET

    Language:Visual Basic .NET1
  • projects

    Repo for all projects

  • Versatile_Code_Hub

    VersatileCodeHub: Your one-stop repository for an array of coding projects. Explore diverse applications, from games like Flappy Bird to tools like QRCode Scanners. Expand your skills across various domains, all in one place.

    Language:Python1
  • nocodefunctions-io

    io for nocodefunctions: csv, txt, pdf, and xlsx so far

    Language:Java1
  • OCR-Django

    Implementing the concept of Optical Character Recognition in Django

    Language:Python1
  • selectpdf-api-nodejs-client

    Node.js client for SelectPdf Online REST API

    Language:JavaScript1
  • selectpdf-api-ruby-client

    Ruby client for SelectPdf Online REST API

    Language:Ruby1
  • selectpdf-api-perl-client

    Perl client for SelectPdf Online REST API

    Language:Perl1
  • Blind-EYE

    A book reader with voice control functionality for blind people

    Language:C#1
  • pcu_pdf

    PDF parser component (Apache Tika) for PCU project

    Language:Python1
  • tikago

    Apache Tika adapter in Go

    Language:Go1
  • pdf-to-audio-summary

    A Streamlit app to upload PDFs, extract text, convert it to speech (MP3), and generate AI-powered summaries.

    Language:Python
  • pdfium-parser

    CLI tool to extract text from PDF

    Language:C
  • Automated-Invoice-Processing

    Automated Invoice Processing Software - Transform your manual invoice processing with enterprise-grade AI automation.

  • resume-optimizer

    AI based resume optimizer

    Language:Python
  • Resume-Analyser

    A Resume Analyzer that scores resumes, highlights issues, suggests improvements, generates LaTeX code for each section, and visualizes ATS scores through interactive performance graphs.

    Language:TypeScript