Pinned Repositories
mpmath
微信公众号公式编辑插件
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
opendatalab-datasets
datasets resource
PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
WanJuan1.0
万卷1.0多模态语料
WanJuan2.0-WanJuan-CC
WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。
.github
MiniGemini
Official implementation for Mini-Gemini
MiraData
PaddleNLP
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
qiangqiang199's Repositories
qiangqiang199/.github
qiangqiang199/MiniGemini
Official implementation for Mini-Gemini
qiangqiang199/MiraData
qiangqiang199/PaddleNLP
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
qiangqiang199/MinerU
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
qiangqiang199/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction