pdfminer

There are 73 repositories under pdfminer topic.

  • cseas/ocr-table

    Extract tables from scanned image PDFs using Optical Character Recognition.

    Language:Python27613468
  • jaks6/citation_map

    Create a Gephi Citation Graph based on Text Analysis of PDFs from Zotero

    Language:Python1514719
  • PDFs-TextExtract

    ahmedkhemiri95/PDFs-TextExtract

    Multiple and Large PDF Documents Text Extraction.

    Language:Python1317165
  • FFengIll/pdf-cut-white

    自动裁剪PDF图表中的白边 / Automatically remove white borders in PDF figures.

    Language:Python10831413
  • Cheereus/PdfSplitter

    将pdf转为txt然后进行分词,并进行词频统计

    Language:Python34103
  • dsc-iiitdmk/Pick-Parser

    This Project is to create a tool which can parse the Resumes and transform them into our own templates

    Language:Python21107
  • caputchinefrobles/doufinder

    DouFinder: Script para pesquisa/alerta de termos no Diário Oficial da União (DOU).

    Language:Python20247
  • elliotxx/paper_autotranslation

    An automatic translation tool for paper ( PDF => TXT, English => Chinese )

    Language:Python19304
  • bhaveshk22/AI-Resume-Analyzer

    This Repository contains AI Resume Analyzer that utilizes PDF parsing, database management, SQL-Python integration, and data extraction from PDFs. It offers skill recommendations and suggests videos and lectures for skill enhancement, aiming to enhance resume quality and job prospects.

    Language:Python16202
  • soham-1/fastapi_pdfextractor

    An api using fastapi for extracting the text content of pdf using pdfminer. It also supports scanned images in pdf's by using tesseract and ocrmypdf.

    Language:Python15105
  • cutright/IMRT-QA-Data-Miner

    Scans a directory for IMRT QA results

    Language:Python144115
  • annacprice/pdf-scraper

    PDF parser using pdfminer and pytesseract for OCR support

    Language:Python11101
  • Trailblazer29/Resume-Scanner

    A resume scanner for Applicant Tracking Systems (ATS) to assess the similarity between applicants' resumes and job descriptions

    Language:Jupyter Notebook11203
  • gagangulyani/COVID-Text-Extractor

    OCR made for the specific use case of extracting Covid Info from Images, PDFs and Texts

    Language:Python101101
  • yoshihikoueno/pdfminer-layout-scanner

    A more complete example of programming with PDFMiner, which continues where the default documentation stops

    Language:Python7104
  • renan-siqueira/python-pdf-tool

    This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.

    Language:Python6101
  • erikkastelec/PDFScraper

    CLI program for searching inside text and tables in PDF documents and displaying results in HTML.

    Language:Python5101
  • Shahabks/Converter-pdf-files-to-.txt-or-.html

    PDFs are notoriously difficult to scrape. This program converts them to *.txt or *.html formats. The program has tested for Latin alphabets and Japanese.

    Language:CSS5104
  • suyashb95/autoindex

    A command line tool to automatically create a navigable index for e-books

    Language:Python5141
  • jwest951227/extractorChinese

    NLP model for extracting chinese datas from the documents

    Language:Python410
  • plain-jane-gray/parse-PDF-NLP-ML

    Parses apart a PDF file into separate documents and then uses Natural Language Processing, Machine Learning models, and statistics to rank the documents by similarity to a single document.

    Language:Jupyter Notebook4100
  • SriMathi-2705/Bulk_Resume_Parsing

    The project focuses on extracting and structuring key details from resumes such as names, contact information, education, and work experience, into a user-friendly interface for efficient review and management. Additionally, It supports bulk upload of resumes.

    Language:Python4100
  • gaazau/pdf2txt

    Based pdfminer.six, Convert PDF file into text or images

    Language:Python3100
  • Sunil4423/Data-Extraction-

    Extracting information from resume

    Language:Jupyter Notebook3111
  • AindriyaBarua/PDF-mining

    Web scrapped to create Indian NER dataset, injected CONLL data with Indian data, fine-tuned BERT, weighted Fasttext for unsupervised KNN to classify reports, extracted data from PDFs

    Language:Jupyter Notebook2200
  • codetronaut/doc_tag_test

    This tool basically searches the given word in pdf file hierarchy. It searches one or more keywords in the hierarchy and generates an HTML report of it.

    Language:Python2010
  • jonix6/minepdf

    Pure-Python PDF extraction tool based on PDFMiner

    Language:Python2101
  • LyuLyn/linkedin-resume-parsing

    Parsing LinkedIn resume pdf files with pdfminer

    Language:Python2103
  • pradeepbatchu/paddleocr

    Image to Text with Flask application

    Language:Python2102
  • Shristirajpoot/Resume_analyser

    🧠 Smart Resume Analyser using Streamlit | Score resumes, suggest careers, skills, courses & videos 🎯 📄 PDF Parsing • 🧩 ML Classifier • 🎥 YouTube Recs

    Language:Python2
  • Unrelenting/PDF-Classifier

    PDF Classifier for a Mortgage Company

    Language:Python2002
  • echoAbhinav/AdobeHackathonRound_1B

    comprehensive solutions for Adobe's Document Intelligence Hackathon 2025, encompassing two distinct challenges focused on advanced PDF processing and persona-driven content analysis. Both implementations adhere to stringent performance requirements including sub-60-second execution times and containerized deployment within 1GB resource constraints

    Language:Python1
  • edpomacedo/bdij-pdfminer

    Ferramenta para extração de texto de documentos PDF.

    Language:Python1100
  • EL-Mehdiri/PDF-Text-Extraction-API

    A REST API to extract structured text, font details, and positioning from PDF files using Node.js and Python.

    Language:JavaScript1
  • EzioDEVio/plantdeck_rag

    PlantDeck is an offline herbal RAG that indexes your PDF books and monographs, extracts text/images with OCR, and answers questions with page-level citations using a local LLM via Ollama. Runs on your machine; no cloud. Field guide only; not medical advice.

    Language:Python1
  • KvaytG/fb2-converter

    Simple converter from TXT, PDF and EPUB to FB2 format

    Language:Python1