Issues
- 0
SFT Trainer Packing Validation
#1672 opened by alex-jw-brooks - 2
Error when Using 8-bit Quantization
#1616 opened by JhonDan1999 - 2
Why compute IPO loss using `average_log_prob=Ture`?
#1677 opened by AIR-hl - 3
Why the loss function in ORPO compute nll loss respectively rather than use chosne_logps?
#1676 opened by AIR-hl - 0
There is no chat_template of the model used when constructing data in DPOTrainer
#1687 opened by mst272 - 3
ImportError: cannot import name 'SFTConfig' from 'trl'
#1639 opened by brand17 - 17
ImportError: cannot import name 'DPOConfig' from 'trl'
#1642 opened by AswiniNLP - 1
vsft_llava: ValueError: Expected input batch_size (78528) to match target batch_size (41728).
#1685 opened by kishan-character - 5
Can we use SFTTrainer for pre-training?
#1657 opened by wennycooper - 0
- 1
Training stops early
#1601 opened by Techinix - 3
CLI utils class cases seem to be incorrect
#1600 opened by busycalibrating - 3
examples don't work
#1656 opened by yechenzhi - 6
Seq2SeqTrainer with DataCollatorForCompletionOnlyLM: incorrect masking for evaluation
#1634 opened by adamamer20 - 0
- 1
- 2
- 3
Possible risks in xxxPOTrainer
#1679 opened by AIR-hl - 0
SFT does not add the generation prompt to the dataset when using apply_chat_template() automatically.
#1675 opened by DreRnc - 6
When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01
#1638 opened by Minami-su - 1
ImportError: cannot import name 'SFTScriptArguments' from 'trl.commands.cli_utils' & --report_to flag
#1669 opened by alielfilali01 - 0
ppov2 & rloo trainer doesn't correctly compute reward if reward tokenizer is different from policy tokenizer
#1674 opened by TheBlackCat22 - 1
Path for support generative eval in DPOTrainer
#1671 opened by prompteus - 1
OOM with DPO Trainer on A100 GPU
#1667 opened by JhonDan1999 - 0
Llama 3 Unsloth Fixes
#1668 opened by lhl - 14
error when using PPO in Gemma
#1663 opened by mostafamdy - 0
Bug in DPO trainning example
#1666 opened by AIR-hl - 4
DDPO cannot use SDXL
#1630 opened by mao-code - 1
PPOTrainer ignores data_collator keyword argument and uses provided collator inconsistently
#1629 opened by codezakh - 2
how to save v_head
#1650 opened by zyzhang1130 - 1
Adapter name for SFT trainer
#1649 opened by para-zhou - 2
- 1
Set seed
#1648 opened by user799595 - 2
[enhancement] Implement IRPO training custom loss
#1611 opened by TheGhoul21 - 2
Custom DPO Trainer CUDA OOM
#1626 opened by TheGhoul21 - 1
ConstantLengthDataset Ignore Some Texts
#1621 opened by TianyiPeng - 1
Long data length cause Cuda ouf of memory when DPO training
#1619 opened by virt9 - 1
KTO finetuning - float division by zero
#1651 opened by jetlime - 1
- 6
Learning to generate EOS tokens
#1623 opened by vwxyzjn - 2
DPO loss remains 0.6931 and reward is stuck at 0.0
#1627 opened by virt9 - 0
ValueError when training on a multi GPU setup and DPO
#1645 opened by miosturu - 2
How to do fp16 training with PPOTrainer?
#1614 opened by KwanWaiChung - 4
Have trouble in ppo example
#1618 opened by Shiguang-Guo - 1
Wrong prefix for logs in KTOTrainer
#1631 opened by bartoszzuk - 0
How to save and resume a checkpoint from PPOTrainer
#1643 opened by paraGONG - 3
How to use trl\trainer\kto_trainer.py
#1635 opened by mazhengyufreedom - 2
kto error when assign dataset to device
#1620 opened by mostafamdy - 0
Seq2seq model with ppo_trainer samples strange output!
#1633 opened by sajastu - 1
Bug: `tests` are being included in package
#1606 opened by jamesbraza