YX-S-Z's Stars
xai-org/grok-1
Grok open release
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
karpathy/LLM101n
LLM101n: Let's build a Storyteller
huggingface/trl
Train transformer language models with reinforcement learning.
Farama-Foundation/Gymnasium
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
thu-ml/tianshou
An elegant PyTorch deep reinforcement learning library.
vwxyzjn/cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
google-deepmind/alphageometry
xjdr-alt/entropix
Entropy Based Sampling and Parallel CoT Decoding
datamllab/rlcard
Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
pytorch/rl
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
Farama-Foundation/Arcade-Learning-Environment
The Arcade Learning Environment (ALE) -- a platform for AI research.
apple/axlearn
An Extensible Deep Learning Library
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Ma-Lab-Berkeley/CRATE
Code for CRATE (Coding RAte reduction TransformEr).
Genesis-Embodied-AI/RoboGen
A generative and self-guided robotic agent that endlessly propose and master new skills.
llava-rlhf/LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF
RL4VLM/RL4VLM
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
nexusflowai/NexusRaven
NexusRaven-13B, a new SOTA Open-Source LLM for function calling. This repo contains everything for reproducing our evaluation on NexusRaven-13B and baselines.
tianyi-lab/HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
brentyi/egoallo
Estimating Body and Hand Motion in an Ego-sensed World
young-geng/scalax
A simple library for scaling up JAX programs
moka-manipulation/moka
MOKA: Open-World Robotic Manipulation through Mark-based Visual Prompting (RSS 2024)
rail-berkeley/fmb
young-geng/mintext
Minimal but scalable implementation of large language models in JAX
efrick2002/Starling
FengdiC/OTTD