OpenLLMAI/OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)

PythonApache-2.0

Issues

Does this codebase consider using "torch.compile"?
#309 opened 13 days ago by eyuansu62
2
可以增加支持SimPO吗
#311 opened 13 days ago by victorShawFan
0
wrong action_log_probs returned?
#310 opened 13 days ago by thirteenflt
1
Dummy token for prompts in HH datasets
#308 opened 14 days ago by louieworth
2
使用Deepseek-lite训练DPO，显示expected mat1 and mat2 to have the same type, but got: float != c10: : BFLoat16
#306 opened 15 days ago by victorShawFan
2
Will 2 x GPU setups be supported
#307 opened 15 days ago by llmlocal
1
Strange Kill of Critic Model
#305 opened 18 days ago by Ricardokevins
5
Claim your paper on HF
#299 opened 18 days ago by adeenayakup
1
Suggestion on the configurations
#304 opened 18 days ago by Ricardokevins
1
action_log_probs重复计算
#301 opened 19 days ago by cdm114514
2
Incompatibility with Qwen
#303 opened 19 days ago by Ricardokevins
2
Support Llama-3 models
#302 opened 19 days ago by wenlinyao
1
[Question] EOS in reward model dataset
#300 opened 20 days ago by qwenzo
2
RLHF for classification tasks
#291 opened a month ago by vinodrajendran001
2
Avoid monkey patching vLLM
#297 opened 21 days ago by Atry
1
我们正在对比DSchat跟OpenRLHF的性能以便完成选型工作，能否提供下修复后的DSChat代码，从而复现社区提供的性能对比数据
#296 opened 22 days ago by yinzhijian
7
QLORA model loading error
#295 opened 23 days ago by karthik-nexusflow
5
Custom ExperienceMaker
#285 opened a month ago by mgerstgrasser
4
Is save checkpoint not yet supported for ppo ray trainer?
#256 opened 2 months ago by mickel-liu
5
启用PPO Ray后无响应
#292 opened a month ago by victorShawFan
3
maybe data bug with dpo trainer
#294 opened a month ago by none0663
1
PPO采用zero 3 stage后产生time out error
#293 opened a month ago by victorShawFan
1
HTTPError when running train_ppo_llama_ray.sh
#290 opened a month ago by Zeyuan-Liu
5
[question] long context for single model ppo training
#289 opened a month ago by yananchen1989
1
RM training loss becomes NAN when finish the first training step.
#288 opened a month ago by lixsh6
1
PPO训练之后模型拒绝回答
#287 opened a month ago by burger-pb
3
Update NGC and vllm version.
#282 opened a month ago by THINK2TRY
2
when import requests, class NewLineFormatter(logging.Formatter): AttributeError: partially initialized module 'logging' has no attribute 'Formatter' (most likely due to a circular import)
#284 opened a month ago by catqaq
0
内存超出问题
#277 opened a month ago by burger-pb
3
[Baseline] LLaMA2-7B RLHF training curves
#263 opened 2 months ago by hijkzzz
1
AssertionError: mismatch size output_state_dict(148) and state_dict(149) sft training
#274 opened 2 months ago by qwenzo
3
The configuration for Llama-7b on 4 RTX4090
#269 opened 2 months ago by LinkyLiu
5
CUDA out of memory when i run train_ppo_llama_ray.sh on 4 RTX 4090(24G)
#275 opened 2 months ago by libowen424
2
NCCL Broad cast error after first actor fit
#271 opened 2 months ago by karthik-nexusflow
17
[For your information] Ways to build environment and run openrlhf codes on a slurm cluster
#251 opened 2 months ago by cangcn
2
use custom datasets and cache_dir
#259 opened 2 months ago by UbeCc
4
reward model数据集问题
#273 opened 2 months ago by burger-pb
3
How long does single LLM's tunning reuqired?
#262 opened 2 months ago by alphahumancoder
3
PPO training configuration for train_ppo_llama.sh
#272 opened 2 months ago by MurrayTom
1
Issue with models not using `position_ids`
#270 opened 2 months ago by kfertakis
1
Inconsistent python version dependency
#268 opened 2 months ago by snailrowen1337
1
Documentation for using Kuberay
#266 opened 2 months ago by karthik-nexusflow
4
add test pipeline: use small LLM and small data
#267 opened 2 months ago by catqaq
0
vllm / actor broadcast error in multinode training
#265 opened 2 months ago by karthik-nexusflow
28
How to get score for a single response from a trained RM
#260 opened 2 months ago by UbeCc
1
debugging with ray
#258 opened 2 months ago by mickel-liu
2
how to train in fp16?
#255 opened 3 months ago by dshnightmare
1
Hardware requirement
#254 opened 3 months ago by ridiculouz
6
this repo's hack of rope embedding accepts different input than transformers
#252 opened 3 months ago by babu111
3
Support ORPO
#253 opened 3 months ago by paulcx
1