Marcovaldon's Stars
AI4Math-ShanZhang/SVE-Math
Implementation of the paper Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs
zhentingqi/rStar
mbzuai-oryx/LlamaV-o1
Rethinking Step-by-step Visual Reasoning in LLMs
vsubramaniam851/multiagent-ft
Ucas-HaoranWei/Slow-Perception
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
RUCAIBox/Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
njucckevin/MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
PRIME-RL/PRIME
Scalable RL solution for advanced reasoning of language models
qiwang067/awesome-visual-rl
A curated list of visual reinforcement learning resources
RUCAIBox/Slow_Thinking_with_LLMs
A series of technical report on Slow Thinking with LLM
OpenGVLab/TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
HJYao00/Mulberry
microsoft/markitdown
Python tool for converting files and office documents to Markdown.
euclid-multimodal/Euclid
bklieger-groq/g1
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
openreasoner/openr
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Quinn777/AtomThink
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
dle666/R-CoT
Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
LeapLabTHU/GSVA
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
opendatalab/DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
chongzhangFDU/ROOR
This is the official implementation to the EMNLP 2024 paper: Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding.
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
VITA-MLLM/VITA
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
GAIR-NLP/O1-Journey
O1 Replication Journey
dailenson/One-DM
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation