Marcovaldon

Marcovaldon's Stars

AI4Math-ShanZhang/SVE-Math
Implementation of the paper Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs
Language:Python51
zhentingqi/rStar
Language:Python853100
mbzuai-oryx/LlamaV-o1
Rethinking Step-by-step Visual Reasoning in LLMs
Language:Python19412
vsubramaniam851/multiagent-ft
Language:Python13717
Ucas-HaoranWei/Slow-Perception
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
Language:Python804
RUCAIBox/Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
Language:Python77
njucckevin/MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
Language:Python601
PRIME-RL/PRIME
Scalable RL solution for advanced reasoning of language models
Language:Python91060
qiwang067/awesome-visual-rl
A curated list of visual reinforcement learning resources
1699
RUCAIBox/Slow_Thinking_with_LLMs
A series of technical report on Slow Thinking with LLM
32011
OpenGVLab/TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
Language:Python381
HJYao00/Mulberry
Language:Python2425
microsoft/markitdown
Python tool for converting files and office documents to Markdown.
Language:Python35k1.6k
euclid-multimodal/Euclid
Language:Python121
bklieger-groq/g1
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Language:Python4.2k375
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python34.1k5.2k
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python1.9k234
openreasoner/openr
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Language:Python1.5k115
Quinn777/AtomThink
Language:Python51
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python5.2k463
dle666/R-CoT
Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Language:Python16412
LeapLabTHU/GSVA
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
Language:Python110
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Language:Python4.2k261
opendatalab/DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Language:Python77458
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。
Language:Python24.7k1.9k
chongzhangFDU/ROOR
This is the official implementation to the EMNLP 2024 paper: Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding.
Language:Python181
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Language:Python3.4k344
VITA-MLLM/VITA
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Language:Python2k140
GAIR-NLP/O1-Journey
O1 Replication Journey
1.9k58
dailenson/One-DM
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation
Language:Python33132