Issues
- 1
OOM error with PEFT LoRA on Llama2-7B
#601 opened by arpaiva - 0
Load the checkpoint fails
#600 opened by AfraAmini - 2
Issue of tensors share memory
#591 opened by heraldiclily - 1
Multi-GPU training errors with peft
#581 opened by AliengirlLiv - 0
- 1
maybe bug in prepare & load's order
#598 opened by daiwk - 1
- 1
Issue since most recent transformers update
#580 opened by siddharthverma314 - 0
Crash when using save_state with deepspeed: `model.state_dict` functions incompatible with new deepspeed.
#596 opened by JohannesAck - 0
- 7
- 0
- 0
TRLX Environment customization
#593 opened by heraldiclily - 9
Unable to load the trained model to do the inference
#545 opened by CSerxy - 1
when i use trlx ppotrainer train a model llama 13b model, but saved huggingface mode ,but when it inference , it has some strange keys ,and the inference result did not show ,it also have no error , it seems the result disapper
#584 opened by ldh127 - 2
- 0
[New Feature Request] Add KTO
#590 opened by 1485840691-eng - 0
RLHF text summarization diverges
#589 opened by AlisonWen - 0
Integration of Self-Play Fine-Tuning (SPIN) Method for Enhancing Large Language Models
#588 opened by SeungyounShin - 0
MPT is not working
#585 opened by ouhenio - 0
Attention mask when calculating log ratio for PPO
#582 opened by kmy17518 - 3
multigpu support for summarization ppo example
#571 opened by sayan1101 - 1
resume_from_checkpoint doesn't work
#577 opened by andrewsiah - 0
Support parallel reward_fn in PPO training
#574 opened by Jingru - 3
`position_ids` error in accelerate PPO trainer
#564 opened by pbarragan - 1
- 2
Question about saving peft checkpoint
#565 opened by nhanph - 3
Problem with LLama training with LoRA
#567 opened by freQuensy23-coder - 0
How to generate reward-labeled dataset
#561 opened by mikkelmedm - 0
How to train LLaMA2 on the summarize_rlhf example?
#559 opened by missflash - 3
- 1
strange design
#501 opened by efengx - 2
Sanity check: SFT Model should be frozen (PPO)
#517 opened by Apsod - 2
Reward model negative numbers meaning
#521 opened by GenVr - 5
Model does not load in the expected dtype
#535 opened by AugustasMacijauskas - 4
- 1
Add support for Falcon 7B/40B
#532 opened by cvetanovskaa - 1
Memory occupy with multi GPUs Training
#548 opened by yuanyaaa - 5
- 2
ILQL training batch2 tensor dimensions error
#540 opened by GenVr - 4
Direct Policy Optimization
#504 opened by Reichenbachian - 1
Add support for LLaMA2
#533 opened by cvetanovskaa - 0
Implement Asynchronous PPO
#531 opened by Dahoas - 1
ppo using GLM2-6b as a backbone?
#523 opened by fanxinyun1991 - 0
- 0
8-bit inference
#512 opened by glerzing - 1
- 0
Add support for safetensors
#505 opened by glerzing - 1
About the weight of word embedding being nan
#503 opened by ItGirls - 0
Use tiny models for the tests
#502 opened by glerzing