reasoning-language-models

There are 64 repositories under reasoning-language-models topic.

zai-org/GLM-4.5
GLM-4.5: An open-source large language model designed for intelligent agents by Z.ai
Language:Python2.6k260
reasoning-survey/Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
622 7 657
mims-harvard/TxAgent
TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools
Language:Python53379
dvlab-research/Seg-Zero
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Language:Python513 2 824
LightChen233/Awesome-Long-Chain-of-Thought-Reasoning
Latest Advances on Long Chain-of-Thought Reasoning
504 2 725
DavidZWZ/Awesome-RAG-Reasoning
[EMNLP 2025] Awesome RAG Reasoning Resources
29020
dvlab-research/VisionReasoner
Vision Manus: Your versatile Visual AI assistant
Language:Python26915
krystalan/DRT
Deep Reasoning Translation (DRT) Project
231 5 49
mims-harvard/ToolUniverse
ToolUniverse is a collection of biomedical tools designed for AI agents
Language:Python211 2 129
multimodal-art-projection/LatentCoT-Horizon
📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.
2065
a-m-team/a-m-models
a-m-team's exploration in large language modeling
1883
codelion/pts
Pivotal Token Search
Language:Python1259
yihedeng9/OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
Language:Python108 4 65
SalesforceAIResearch/MAS-Zero
Designing Multi-Agent Systems with Zero Supervision
Language:Python9910
mims-harvard/CUREBench
CUREBench @ NeurIPS 2025: Benchmarking AI reasoning for therapeutic decision-making at scale
Language:Python98
spcl/x1
Official Implementation of "Reasoning Language Models: A Blueprint"
Language:Python75 11 112
Wild-Cooperation-Hub/Awesome-MLLM-Reasoning-Benchmarks
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
686
The-FinAI/Fino1
This is the repo of developing reasoning models in the specific domain of financial, aim to enhance models capabilities in handling financial reasoning tasks.
Language:Jupyter Notebook64 1 28
DolbyUUU/Logic-RL-Lite
Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".
Language:Python48 2 10
MozerWang/AMPO
[arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents
Language:Python42
AI4Phys/SeePhys
Official implementation for the paper "SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning"
Language:Python39
MaxBelitsky/cache-steering
KV Cache Steering for Inducing Reasoning in Small Language Models
Language:Python39
WisdomShell/RewardAnything
RewardAnything: Generalizable Principle-Following Reward Models
Language:Python39
DolbyUUU/DeepEnlighten
Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.
Language:Python380
zihao-ai/unthinking_vulnerability
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
Language:Python32 2 20
tum-ai/number-token-loss
A regression-alike loss to improve numerical reasoning in language models - ICML 2025
Language:Jupyter Notebook25 2 25
zonenoname/CharmBench
A preview-version of one novel multimodal reasoning benchmark CharmBench.
Language:Jupyter Notebook23
linhaowei1/kumo
☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models
Language:Jupyter Notebook190
Hyun-Ryu/clover
Official code for "Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning", ICLR 2025.
Language:Python17 1 01
thinkwee/NOVER
[EMNLP-2025] R1-Zero on ANY TASK
Language:Python14
tomascupr/thinkthread
thinkthread SDK - Supercharge Your AI Applications with Human-Like Reasoning
Language:Python141
Trustworthy-ML-Lab/ThinkEdit
An effective weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.
Language:Python141
sparkle-reasoning/sparkle
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
Language:Python13
parameterlab/leaky_thoughts
Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers"
Language:Python12
NellyW8/VeriReason
This is the Github Repo for the paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
Language:Python11
AmanPriyanshu/GPT-OSS-MoE-ExpertFingerprinting
ExpertFingerprinting: Behavioral Pattern Analysis and Specialization Mapping of Experts in GPT-OSS-20B's Mixture-of-Experts Architecture
Language:HTML10 0 01

reasoning-language-models

zai-org/GLM-4.5

reasoning-survey/Awesome-Reasoning-Foundation-Models

mims-harvard/TxAgent

dvlab-research/Seg-Zero

LightChen233/Awesome-Long-Chain-of-Thought-Reasoning

DavidZWZ/Awesome-RAG-Reasoning

dvlab-research/VisionReasoner

krystalan/DRT

mims-harvard/ToolUniverse

multimodal-art-projection/LatentCoT-Horizon

a-m-team/a-m-models

codelion/pts

yihedeng9/OpenVLThinker

SalesforceAIResearch/MAS-Zero

mims-harvard/CUREBench

spcl/x1

Wild-Cooperation-Hub/Awesome-MLLM-Reasoning-Benchmarks

The-FinAI/Fino1

DolbyUUU/Logic-RL-Lite

MozerWang/AMPO

AI4Phys/SeePhys

MaxBelitsky/cache-steering

WisdomShell/RewardAnything

DolbyUUU/DeepEnlighten

zihao-ai/unthinking_vulnerability

tum-ai/number-token-loss

zonenoname/CharmBench

linhaowei1/kumo

Hyun-Ryu/clover

thinkwee/NOVER

tomascupr/thinkthread

Trustworthy-ML-Lab/ThinkEdit

sparkle-reasoning/sparkle

parameterlab/leaky_thoughts

NellyW8/VeriReason

AmanPriyanshu/GPT-OSS-MoE-ExpertFingerprinting