Pinned Repositories
DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
LabelLLM
The Open-Source Data Annotation Platform
labelU
Data annotation toolbox supports image, audio and video data.
LOKI
The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
magic-doc
magic-html
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
UniMERNet
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
WanJuan1.0
万卷1.0多模态语料
OpenDataLab's Repositories
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
opendatalab/labelU
Data annotation toolbox supports image, audio and video data.
opendatalab/DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
opendatalab/LabelLLM
The Open-Source Data Annotation Platform
opendatalab/WanJuan1.0
万卷1.0多模态语料
opendatalab/magic-doc
opendatalab/magic-html
opendatalab/UniMERNet
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
opendatalab/LOKI
The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
opendatalab/laion5b-downloader
opendatalab/opendatalab-datasets
datasets resource
opendatalab/VIGC
AAAI 2024: Visual Instruction Generation and Correction
opendatalab/labelU-Kit
Data annotation component library --provided as NPM packages
opendatalab/HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
opendatalab/CLIP-Parrot-Bias
ECCV2024_Parrot Captions Teach CLIP to Spot Text
opendatalab/opendatalab-python-sdk
SDK of OpenDataLab - https://opendatalab.org.cn
opendatalab/MLS-BRN
[CVPR 2024] 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
opendatalab/VHM
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
opendatalab/dsdl-docs
Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)
opendatalab/skydiffusion
The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”
opendatalab/MLLM-DataEngine
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
opendatalab/image-downloader
opendatalab/CHARM
[ACL 2024 Main Conference] Chinese commonsense benchmark for LLMs
opendatalab/Miner-PDF-Benchmark
MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.
opendatalab/dsdl-sdk
opendatalab/CrossViewDiff
The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"
opendatalab/WanJuan2.0-WanJuan-CC
WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。
opendatalab/.github
opendatalab/UrBench