Pinned Repositories
DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
LabelLLM
The Open-Source Data Annotation Platform
labelU
Data annotation toolbox supports image, audio and video data.
magic-doc
magic-html
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
OmniDocBench
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
UniMERNet
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
WanJuan1.0
万卷1.0多模态语料
OpenDataLab's Repositories
opendatalab/CRaFT
[AAAI25] Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
opendatalab/Miner-PDF-Benchmark
MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.
opendatalab/CLIP-Parrot-Bias
ECCV2024_Parrot Captions Teach CLIP to Spot Text
opendatalab/CrossViewDiff
The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"
opendatalab/CHARM
[ACL 2024 Main Conference] Chinese commonsense benchmark for LLMs
opendatalab/magic-doc
opendatalab/dsdl-sdk
opendatalab/dsdl-docs
Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)
opendatalab/MLLM-DataEngine
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
opendatalab/image-downloader
opendatalab/WanJuan2.0-WanJuan-CC
WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。
opendatalab/VIGC
AAAI 2024: Visual Instruction Generation and Correction
opendatalab/HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
opendatalab/WanJuan1.0
万卷1.0多模态语料
opendatalab/laion5b-downloader
opendatalab/labelU-frontend
LabelU front-end library
opendatalab/allz
A universal command line tool for compression and decompression
opendatalab/labelU-ML
opendatalab/labelbee