pdfminer
There are 73 repositories under pdfminer topic.
cseas/ocr-table
Extract tables from scanned image PDFs using Optical Character Recognition.
jaks6/citation_map
Create a Gephi Citation Graph based on Text Analysis of PDFs from Zotero
ahmedkhemiri95/PDFs-TextExtract
Multiple and Large PDF Documents Text Extraction.
FFengIll/pdf-cut-white
自动裁剪PDF图表中的白边 / Automatically remove white borders in PDF figures.
Cheereus/PdfSplitter
将pdf转为txt然后进行分词,并进行词频统计
dsc-iiitdmk/Pick-Parser
This Project is to create a tool which can parse the Resumes and transform them into our own templates
caputchinefrobles/doufinder
DouFinder: Script para pesquisa/alerta de termos no Diário Oficial da União (DOU).
elliotxx/paper_autotranslation
An automatic translation tool for paper ( PDF => TXT, English => Chinese )
bhaveshk22/AI-Resume-Analyzer
This Repository contains AI Resume Analyzer that utilizes PDF parsing, database management, SQL-Python integration, and data extraction from PDFs. It offers skill recommendations and suggests videos and lectures for skill enhancement, aiming to enhance resume quality and job prospects.
soham-1/fastapi_pdfextractor
An api using fastapi for extracting the text content of pdf using pdfminer. It also supports scanned images in pdf's by using tesseract and ocrmypdf.
cutright/IMRT-QA-Data-Miner
Scans a directory for IMRT QA results
annacprice/pdf-scraper
PDF parser using pdfminer and pytesseract for OCR support
Trailblazer29/Resume-Scanner
A resume scanner for Applicant Tracking Systems (ATS) to assess the similarity between applicants' resumes and job descriptions
gagangulyani/COVID-Text-Extractor
OCR made for the specific use case of extracting Covid Info from Images, PDFs and Texts
yoshihikoueno/pdfminer-layout-scanner
A more complete example of programming with PDFMiner, which continues where the default documentation stops
renan-siqueira/python-pdf-tool
This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.
erikkastelec/PDFScraper
CLI program for searching inside text and tables in PDF documents and displaying results in HTML.
Shahabks/Converter-pdf-files-to-.txt-or-.html
PDFs are notoriously difficult to scrape. This program converts them to *.txt or *.html formats. The program has tested for Latin alphabets and Japanese.
suyashb95/autoindex
A command line tool to automatically create a navigable index for e-books
jwest951227/extractorChinese
NLP model for extracting chinese datas from the documents
plain-jane-gray/parse-PDF-NLP-ML
Parses apart a PDF file into separate documents and then uses Natural Language Processing, Machine Learning models, and statistics to rank the documents by similarity to a single document.
SriMathi-2705/Bulk_Resume_Parsing
The project focuses on extracting and structuring key details from resumes such as names, contact information, education, and work experience, into a user-friendly interface for efficient review and management. Additionally, It supports bulk upload of resumes.
gaazau/pdf2txt
Based pdfminer.six, Convert PDF file into text or images
Sunil4423/Data-Extraction-
Extracting information from resume
AindriyaBarua/PDF-mining
Web scrapped to create Indian NER dataset, injected CONLL data with Indian data, fine-tuned BERT, weighted Fasttext for unsupervised KNN to classify reports, extracted data from PDFs
codetronaut/doc_tag_test
This tool basically searches the given word in pdf file hierarchy. It searches one or more keywords in the hierarchy and generates an HTML report of it.
jonix6/minepdf
Pure-Python PDF extraction tool based on PDFMiner
LyuLyn/linkedin-resume-parsing
Parsing LinkedIn resume pdf files with pdfminer
pradeepbatchu/paddleocr
Image to Text with Flask application
Shristirajpoot/Resume_analyser
🧠 Smart Resume Analyser using Streamlit | Score resumes, suggest careers, skills, courses & videos 🎯 📄 PDF Parsing • 🧩 ML Classifier • 🎥 YouTube Recs
Unrelenting/PDF-Classifier
PDF Classifier for a Mortgage Company
echoAbhinav/AdobeHackathonRound_1B
comprehensive solutions for Adobe's Document Intelligence Hackathon 2025, encompassing two distinct challenges focused on advanced PDF processing and persona-driven content analysis. Both implementations adhere to stringent performance requirements including sub-60-second execution times and containerized deployment within 1GB resource constraints
edpomacedo/bdij-pdfminer
Ferramenta para extração de texto de documentos PDF.
EL-Mehdiri/PDF-Text-Extraction-API
A REST API to extract structured text, font details, and positioning from PDF files using Node.js and Python.
EzioDEVio/plantdeck_rag
PlantDeck is an offline herbal RAG that indexes your PDF books and monographs, extracts text/images with OCR, and answers questions with page-level citations using a local LLM via Ollama. Runs on your machine; no cloud. Field guide only; not medical advice.
KvaytG/fb2-converter
Simple converter from TXT, PDF and EPUB to FB2 format