ZhangDaPwn

ZhangDaPwn's Stars

wyf3/llm_related
记录大模型相关的一些知识和方法
Language:Jupyter Notebook49179
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Language:HTML9.9k827
getomni-ai/zerox
PDF to Markdown with vision models
Language:Python9.1k581
deepset-ai/haystack
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Language:Python18.8k2k
DS4SD/docling
Get your documents ready for gen AI
Language:Python19.1k1k
opendatalab/magic-doc
Language:Python41835
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。
Language:Python25k1.9k
microsoft/markitdown
Python tool for converting files and office documents to Markdown.
Language:Python35.6k1.6k
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
Language:Python9.2k587
Stirling-Tools/Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
Language:Java49.1k4.1k
hiroi-sora/Umi-OCR
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。
Language:Python29.1k2.9k
RapidAI/TableStructureRec
整理目前开源的最优表格识别模型，完善前后处理，模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post-processing, and convert the models to ONNX.
Language:Python48245
ArtifexSoftware/pdf2docx
Open source Python library for converting PDF to DOCX.
Language:Python2.7k393
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Language:TypeScript60k8.9k
Alir3z4/html2text
Convert HTML to Markdown-formatted text.
Language:Python1.9k285
YaoFANGUK/video-subtitle-remover
基于AI的图片/视频硬字幕去除、文本水印去除，无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API，本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.
Language:Python5.2k678
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Language:Python6.5k432
NaiboWang/EasySpider
A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。
Language:JavaScript37.1k4.6k
crewAIInc/crewAI
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
Language:Python25.4k3.4k
datawhalechina/so-large-lm
大模型基础: 一文了解大模型基础知识
3.6k317
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Language:Python29.9k2.8k
OleehyO/TexTeller
TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability, enabling it to cover most usage scenarios.
Language:Python43250
lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Language:Python13.3k1.1k
NoEdgeAI/doc2x-doc
doc2x docs
Language:Python424
ultrafunkamsterdam/undetected-chromedriver
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Language:Python10.5k1.2k
HumanSignal/label-studio-ml-backend
Configs and boilerplates for Label Studio's Machine Learning backend
Language:Python629276
xai-org/grok-1
Grok open release
Language:Python49.9k8.3k
tkem/cachetools
Extensible memoizing collections and decorators
Language:Python2.4k164
Ciphey/Ciphey
⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡
Language:Python18.6k1.2k
karpathy/nn-zero-to-hero
Neural Networks: Zero to Hero
Language:Jupyter Notebook13k1.8k