Issues
- 2
Does this codebase consider using "torch.compile"?
#309 opened by eyuansu62 - 0
可以增加支持SimPO吗
#311 opened by victorShawFan - 1
wrong action_log_probs returned?
#310 opened by thirteenflt - 2
Dummy token for prompts in HH datasets
#308 opened by louieworth - 2
使用Deepseek-lite训练DPO,显示expected mat1 and mat2 to have the same type, but got: float != c10: : BFLoat16
#306 opened by victorShawFan - 1
Will 2 x GPU setups be supported
#307 opened by llmlocal - 5
Strange Kill of Critic Model
#305 opened by Ricardokevins - 1
Claim your paper on HF
#299 opened by adeenayakup - 1
Suggestion on the configurations
#304 opened by Ricardokevins - 2
action_log_probs重复计算
#301 opened by cdm114514 - 2
Incompatibility with Qwen
#303 opened by Ricardokevins - 1
Support Llama-3 models
#302 opened by wenlinyao - 2
[Question] EOS in reward model dataset
#300 opened by qwenzo - 2
RLHF for classification tasks
#291 opened by vinodrajendran001 - 1
Avoid monkey patching vLLM
#297 opened by Atry - 7
- 5
QLORA model loading error
#295 opened by karthik-nexusflow - 4
Custom ExperienceMaker
#285 opened by mgerstgrasser - 5
- 3
启用PPO Ray后无响应
#292 opened by victorShawFan - 1
maybe data bug with dpo trainer
#294 opened by none0663 - 1
PPO采用zero 3 stage后产生time out error
#293 opened by victorShawFan - 5
HTTPError when running train_ppo_llama_ray.sh
#290 opened by Zeyuan-Liu - 1
- 1
- 3
PPO训练之后模型拒绝回答
#287 opened by burger-pb - 2
Update NGC and vllm version.
#282 opened by THINK2TRY - 0
when import requests, class NewLineFormatter(logging.Formatter): AttributeError: partially initialized module 'logging' has no attribute 'Formatter' (most likely due to a circular import)
#284 opened by catqaq - 3
- 1
[Baseline] LLaMA2-7B RLHF training curves
#263 opened by hijkzzz - 3
AssertionError: mismatch size output_state_dict(148) and state_dict(149) sft training
#274 opened by qwenzo - 5
The configuration for Llama-7b on 4 RTX4090
#269 opened by LinkyLiu - 2
- 17
NCCL Broad cast error after first actor fit
#271 opened by karthik-nexusflow - 2
[For your information] Ways to build environment and run openrlhf codes on a slurm cluster
#251 opened by cangcn - 4
use custom datasets and cache_dir
#259 opened by UbeCc - 3
reward model数据集问题
#273 opened by burger-pb - 3
How long does single LLM's tunning reuqired?
#262 opened by alphahumancoder - 1
PPO training configuration for train_ppo_llama.sh
#272 opened by MurrayTom - 1
Issue with models not using `position_ids`
#270 opened by kfertakis - 1
Inconsistent python version dependency
#268 opened by snailrowen1337 - 4
Documentation for using Kuberay
#266 opened by karthik-nexusflow - 0
add test pipeline: use small LLM and small data
#267 opened by catqaq - 28
- 1
- 2
debugging with ray
#258 opened by mickel-liu - 1
how to train in fp16?
#255 opened by dshnightmare - 6
Hardware requirement
#254 opened by ridiculouz - 3
- 1
Support ORPO
#253 opened by paulcx