xionghuichen's Stars
ollama/ollama
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
DIYgod/RSSHub
🧡 Everything is RSSible
ml-explore/mlx
MLX: An array framework for Apple silicon
KindXiaoming/pykan
Kolmogorov Arnold Networks
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
idootop/mi-gpt
🏠 将小爱音箱接入 ChatGPT 和豆包,改造成你的专属语音助手。
heyform/heyform
Open-Source Form Builder
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
huawei-noah/Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
allenai/RL4LMs
A modular RL library to fine-tune language models to human preferences
Farama-Foundation/Metaworld
Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning
google-deepmind/open_x_embodiment
franciszzj/Leffa
Learning Flow Fields in Attention for Controllable Person Image Generation
Cledersonbc/tic-tac-toe-minimax
Minimax is a AI algorithm.
Sentdex/Carla-RL
Reinforcement Learning codebase for self-driving car in Carla
llava-rlhf/LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF
DigiRL-agent/digirl
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
zjunlp/KnowAgent
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
Yuexuan9/Tinker
This project features an open-source small bipedal robot designed for research, education, and hobbyist experimentation.
mmrobotlab/DailyRobot
tinkoff-ai/katakomba
Data-Driven NetHack Tools: Datasets (30+) and recurrent-baselines (AWAC, BC, CQL, IQL, REM)
archersama/Uni-CTR
Source code of TOIS paper "A Unified Framework for Multi-Domain CTR Prediction via Large Language Models"
FanmingL/Recurrent-Offpolicy-RL
Implementation of SAC and TD3 based on various RNN and Transformer.
yixiaoer/mistral-jax
JAX implementation of the Mistral 7b v0.1 model
xionghuichen/policy-conditioned-model
official code of "Effective Offline Environment Reconstruction when the Dataset is Collected from Diversified Behavior Policies"
LAMDA-RL/policy-conditioned-model
official code of "Effective Offline Environment Reconstruction when the Dataset is Collected from Diversified Behavior Policies"
LAMDA-RL/WiseRL
PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms
RobertTLange/mistral-jax
JAX implementation of the Mistral model