zsychina

鼠人

Dalian University of Technology, ChinaGuangzhou, China

zsychina's Stars

hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python38.7k 383 1.7k4.3k
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Language:Python31.8k 204 4.9k3.9k
ShiArthur03/ShiArthur03
Language:MATLAB10.4k 32 1.4k1.9k
huggingface/trl
Train transformer language models with reinforcement learning.
Language:Python9.6k 74 1.1k1.2k
lucidrains/PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
Language:Python7.7k 143 47667
p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch
PyTorch implementations of deep reinforcement learning algorithms and environments
Language:Python5.6k 106 711.2k
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
Language:Python4.5k 108 134393
CarperAI/trlx
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Language:Python4.5k 50 290471
sweetice/Deep-reinforcement-learning-with-pytorch
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
Language:Python3.9k 36 34844
pointfeev/CreamInstaller
Automatically finds all installed Steam, Epic and Ubisoft games with their respective DLC-related DLL locations on the user's computer, parses SteamCMD, Steam Store and Epic Games Store for user-selected games' DLCs, then provides a very simple graphical interface utilizing the gathered information for the maintenance of DLC unlockers.
Language:C#3.9k 42 262187
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
3.3k 61 3203
wangshusen/RecommenderSystem
2.3k 27 6346
Doragd/Algorithm-Practice-in-Industry
搜索、推荐、广告、用增等工业界实践文章收集（来源：知乎、Datafuntalk、技术公众号）
Language:Python2.3k 60 57288
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Language:Python2.1k 21 251207
Curt-Park/rainbow-is-all-you-need
Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow
Language:Jupyter Notebook1.8k 26 32333
ysyisyourbrother/SYSU_Notebook
本项目分享了中山大学计算机学院本科和研究生阶段的课程资料、笔记、期末考试卷和其他实用的相关资源。希望对同学们的学习有所帮助❤️，如果喜欢记得给个star🌟
Language:Python1.5k 12 2243
PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Language:Python1.3k 18 84119
wnlen/clash-for-linux
clash-for-linux
Language:Shell1.1k 6 35416
pranz24/pytorch-soft-actor-critic
PyTorch implementation of soft actor critic
Language:Python803 9 37179
RLHFlow/RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
Language:Python720 19 2959
BlackSamorez/tensor_parallel
Automatically split your PyTorch models on multiple GPUs for training & inference
Language:Python619 8 6638
MorvanZhou/pytorch-A3C
Simple A3C implementation with pytorch + multiprocessing
Language:Python614 14 27142
WangRongsheng/MedQA-ChatGLM
🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调，我们的眼光不止于医疗问答
Language:Python295 5 1244
heleidsn/UAV_Navigation_DRL_AirSim
This is a new repo used for training UAV navigation (local path planning) policy using DRL methods.
Language:Python190 4 4129
seuyh/stellaris-dlc-unlocker
Stellaris DLC Unlocker - tool to automatically unlock all dlc in Stellaris completely free
Language:Python183 5 207
sunghoonhong/AirsimDRL
Autonomous UAV Navigation without Collision using Visual Information in Airsim
Language:Python174 2 1543
chrisliu298/awesome-llm-unlearning
A resource repository for machine unlearning in large language models
156 6 17
liziniu/ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
Language:Python144 2 313
jjccero/pbrl
A Population Based Reinforcement Learning Library based on PyTorch
Language:Python24 1 02
fomorians/forward-models
A tutorial on forward models for model-based reinforcement learning.
Language:Jupyter Notebook7 11 00

zsychina

zsychina's Stars

hpcaitech/ColossalAI

hiyouga/LLaMA-Factory

ShiArthur03/ShiArthur03

huggingface/trl

lucidrains/PaLM-rlhf-pytorch

p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch

huggingface/alignment-handbook

CarperAI/trlx

sweetice/Deep-reinforcement-learning-with-pytorch

pointfeev/CreamInstaller

opendilab/awesome-RLHF

wangshusen/RecommenderSystem

Doragd/Algorithm-Practice-in-Industry

OpenRLHF/OpenRLHF

Curt-Park/rainbow-is-all-you-need

ysyisyourbrother/SYSU_Notebook

PKU-Alignment/safe-rlhf

wnlen/clash-for-linux

pranz24/pytorch-soft-actor-critic

RLHFlow/RLHF-Reward-Modeling

BlackSamorez/tensor_parallel

MorvanZhou/pytorch-A3C

WangRongsheng/MedQA-ChatGLM

heleidsn/UAV_Navigation_DRL_AirSim

seuyh/stellaris-dlc-unlocker

sunghoonhong/AirsimDRL

chrisliu298/awesome-llm-unlearning

liziniu/ReMax

jjccero/pbrl

fomorians/forward-models