Pinned Repositories
CLIP-Parrot-Bias
ECCV2024_Parrot Captions Teach CLIP to Spot Text
DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
UniMERNet
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
VIGC
AAAI 2024: Visual Instruction Generation and Correction
Academic-project-page-template
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
Automatic-Speech-Recognition-from-Scratch
An minimal Seq2Seq example of Automatic Speech Recognition (ASR) based on Transformer
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
BERT-pytorch
Google AI 2018 BERT pytorch implementation
wangbinDL's Repositories
wangbinDL/Academic-project-page-template
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
wangbinDL/Automatic-Speech-Recognition-from-Scratch
An minimal Seq2Seq example of Automatic Speech Recognition (ASR) based on Transformer
wangbinDL/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
wangbinDL/BERT-pytorch
Google AI 2018 BERT pytorch implementation
wangbinDL/HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
wangbinDL/InternLM
Official release of InternLM2 7B and 20B base and chat models. 200K context support
wangbinDL/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
wangbinDL/VIGC-demo
wangbinDL/wangbinDL.github.io
Homepage
wangbinDL/DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
wangbinDL/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
wangbinDL/learn-rst
从 Markdown 转移到 reStructureText 有多难?
wangbinDL/MinerU
MinerU is a one-stop, open-source, high-quality data extraction tool,supports PDF/webpage/e-book extraction.
wangbinDL/streamlit_quick_start
wangbinDL/texify
Math OCR model that outputs LaTeX and markdown
wangbinDL/thesisuestc
ThesisUESTC-电子科技大学毕业论文模板
wangbinDL/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)