pdfminer

There are 73 repositories under pdfminer topic.

cseas/ocr-table
Extract tables from scanned image PDFs using Optical Character Recognition.
Language:Python276 13 468
jaks6/citation_map
Create a Gephi Citation Graph based on Text Analysis of PDFs from Zotero
Language:Python151 4 719
ahmedkhemiri95/PDFs-TextExtract
Multiple and Large PDF Documents Text Extraction.
Language:Python131 7 165
FFengIll/pdf-cut-white
自动裁剪PDF图表中的白边 / Automatically remove white borders in PDF figures.
Language:Python108 3 1413
Cheereus/PdfSplitter
将pdf转为txt然后进行分词，并进行词频统计
Language:Python34 1 03
dsc-iiitdmk/Pick-Parser
This Project is to create a tool which can parse the Resumes and transform them into our own templates
Language:Python21 1 07
caputchinefrobles/doufinder
DouFinder: Script para pesquisa/alerta de termos no Diário Oficial da União (DOU).
Language:Python20 2 47
elliotxx/paper_autotranslation
An automatic translation tool for paper ( PDF => TXT, English => Chinese )
Language:Python19 3 04
bhaveshk22/AI-Resume-Analyzer
This Repository contains AI Resume Analyzer that utilizes PDF parsing, database management, SQL-Python integration, and data extraction from PDFs. It offers skill recommendations and suggests videos and lectures for skill enhancement, aiming to enhance resume quality and job prospects.
Language:Python16 2 02
soham-1/fastapi_pdfextractor
An api using fastapi for extracting the text content of pdf using pdfminer. It also supports scanned images in pdf's by using tesseract and ocrmypdf.
Language:Python15 1 05
cutright/IMRT-QA-Data-Miner
Scans a directory for IMRT QA results
Language:Python14 4 115
annacprice/pdf-scraper
PDF parser using pdfminer and pytesseract for OCR support
Language:Python11 1 01
Trailblazer29/Resume-Scanner
A resume scanner for Applicant Tracking Systems (ATS) to assess the similarity between applicants' resumes and job descriptions
Language:Jupyter Notebook11 2 03
gagangulyani/COVID-Text-Extractor
OCR made for the specific use case of extracting Covid Info from Images, PDFs and Texts
Language:Python10 1 101
yoshihikoueno/pdfminer-layout-scanner
A more complete example of programming with PDFMiner, which continues where the default documentation stops
Language:Python7 1 04
renan-siqueira/python-pdf-tool
This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.
Language:Python6 1 01
erikkastelec/PDFScraper
CLI program for searching inside text and tables in PDF documents and displaying results in HTML.
Language:Python5 1 01
Shahabks/Converter-pdf-files-to-.txt-or-.html
PDFs are notoriously difficult to scrape. This program converts them to *.txt or *.html formats. The program has tested for Latin alphabets and Japanese.
Language:CSS5 1 04
suyashb95/autoindex
A command line tool to automatically create a navigable index for e-books
Language:Python5 1 41
jwest951227/extractorChinese
NLP model for extracting chinese datas from the documents
Language:Python4 1 0
plain-jane-gray/parse-PDF-NLP-ML
Parses apart a PDF file into separate documents and then uses Natural Language Processing, Machine Learning models, and statistics to rank the documents by similarity to a single document.
Language:Jupyter Notebook4 1 00
SriMathi-2705/Bulk_Resume_Parsing
The project focuses on extracting and structuring key details from resumes such as names, contact information, education, and work experience, into a user-friendly interface for efficient review and management. Additionally, It supports bulk upload of resumes.
Language:Python4 1 00
gaazau/pdf2txt
Based pdfminer.six, Convert PDF file into text or images
Language:Python3 1 00
Sunil4423/Data-Extraction-
Extracting information from resume
Language:Jupyter Notebook3 1 11
AindriyaBarua/PDF-mining
Web scrapped to create Indian NER dataset, injected CONLL data with Indian data, fine-tuned BERT, weighted Fasttext for unsupervised KNN to classify reports, extracted data from PDFs
Language:Jupyter Notebook2 2 00
codetronaut/doc_tag_test
This tool basically searches the given word in pdf file hierarchy. It searches one or more keywords in the hierarchy and generates an HTML report of it.
Language:Python2 0 10
jonix6/minepdf
Pure-Python PDF extraction tool based on PDFMiner
Language:Python2 1 01
LyuLyn/linkedin-resume-parsing
Parsing LinkedIn resume pdf files with pdfminer
Language:Python2 1 03
pradeepbatchu/paddleocr
Image to Text with Flask application
Language:Python2 1 02
Shristirajpoot/Resume_analyser
🧠 Smart Resume Analyser using Streamlit | Score resumes, suggest careers, skills, courses & videos 🎯 📄 PDF Parsing • 🧩 ML Classifier • 🎥 YouTube Recs
Language:Python2
Unrelenting/PDF-Classifier
PDF Classifier for a Mortgage Company
Language:Python2 0 02
echoAbhinav/AdobeHackathonRound_1B
comprehensive solutions for Adobe's Document Intelligence Hackathon 2025, encompassing two distinct challenges focused on advanced PDF processing and persona-driven content analysis. Both implementations adhere to stringent performance requirements including sub-60-second execution times and containerized deployment within 1GB resource constraints
Language:Python1
edpomacedo/bdij-pdfminer
Ferramenta para extração de texto de documentos PDF.
Language:Python1 1 00
EL-Mehdiri/PDF-Text-Extraction-API
A REST API to extract structured text, font details, and positioning from PDF files using Node.js and Python.
Language:JavaScript1
EzioDEVio/plantdeck_rag
PlantDeck is an offline herbal RAG that indexes your PDF books and monographs, extracts text/images with OCR, and answers questions with page-level citations using a local LLM via Ollama. Runs on your machine; no cloud. Field guide only; not medical advice.
Language:Python1
KvaytG/fb2-converter
Simple converter from TXT, PDF and EPUB to FB2 format
Language:Python1

pdfminer

cseas/ocr-table

jaks6/citation_map

ahmedkhemiri95/PDFs-TextExtract

FFengIll/pdf-cut-white

Cheereus/PdfSplitter

dsc-iiitdmk/Pick-Parser

caputchinefrobles/doufinder

elliotxx/paper_autotranslation

bhaveshk22/AI-Resume-Analyzer

soham-1/fastapi_pdfextractor

cutright/IMRT-QA-Data-Miner

annacprice/pdf-scraper

Trailblazer29/Resume-Scanner

gagangulyani/COVID-Text-Extractor

yoshihikoueno/pdfminer-layout-scanner

renan-siqueira/python-pdf-tool

erikkastelec/PDFScraper

Shahabks/Converter-pdf-files-to-.txt-or-.html

suyashb95/autoindex

jwest951227/extractorChinese

plain-jane-gray/parse-PDF-NLP-ML

SriMathi-2705/Bulk_Resume_Parsing

gaazau/pdf2txt

Sunil4423/Data-Extraction-

AindriyaBarua/PDF-mining

codetronaut/doc_tag_test

jonix6/minepdf

LyuLyn/linkedin-resume-parsing

pradeepbatchu/paddleocr

Shristirajpoot/Resume_analyser

Unrelenting/PDF-Classifier

echoAbhinav/AdobeHackathonRound_1B

edpomacedo/bdij-pdfminer

EL-Mehdiri/PDF-Text-Extraction-API

EzioDEVio/plantdeck_rag

KvaytG/fb2-converter