nguaman's Stars
robvanderleek/create-issue-branch
Boost your GitHub workflow 🚀
spulec/freezegun
Let your Python tests travel through time
nektos/act
Run your GitHub Actions locally 🚀
snok/install-poetry
Github action for installing and configuring Poetry
Nutlope/llama-ocr
Document to Markdown OCR library with Llama 3.2 vision
pypa/hatch
Modern, extensible Python project management
joblib/joblib
Computing with Python functions.
karpetrosyan/hishel
An elegant HTTP Cache implementation for HTTPX and HTTP Core.
arrow-py/arrow
🏹 Better dates & times for Python
opendatalab/MinerU
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
whyhow-ai/knowledge-table
Knowledge Table is an open-source package designed to simplify extracting and exploring structured data from unstructured documents.
mtkennerly/poetry-dynamic-versioning
Plugin for Poetry to enable dynamic versioning based on VCS tags
pre-commit/pre-commit-hooks
Some out-of-the-box hooks for pre-commit
haydenbleasel/next-forge
Production-grade Turborepo template for Next.js apps.
prettytable/prettytable
Display tabular data in a visually appealing ASCII table format
jgm/pandoc
Universal markup converter
unoconv/unoserver
unoconv/unoconv
Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.
niteshsharmacodes/GenAI-powered-Invoice-Scanner
emcf/thepipe
Extract clean data from anywhere, powered by vision-language models ⚡
Senzing/libpostal-data
Information about libpostal work done by Senzing.
Azure-Samples/document-intelligence-code-samples
Sample site for Document Intelligence code samples and associated media.
Azure-Samples/azure-search-openai-demo
A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
apache/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
dottxt-ai/outlines
Structured Text Generation
DS4SD/docling
Get your documents ready for gen AI
VikParuchuri/surya
OCR, layout analysis, reading order, table recognition in 90+ languages
VikParuchuri/marker
Convert PDF to markdown quickly with high accuracy
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
opendatalab/DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception