huggingface/trl

Train transformer language models with reinforcement learning.

PythonApache-2.0

Issues

SFT Trainer Packing Validation
#1672 opened 5 days ago by alex-jw-brooks
0
Error when Using 8-bit Quantization
#1616 opened a month ago by JhonDan1999
2
Why compute IPO loss using `average_log_prob=Ture`？
#1677 opened 11 days ago by AIR-hl
2
Why the loss function in ORPO compute nll loss respectively rather than use chosne_logps?
#1676 opened 7 days ago by AIR-hl
3
There is no chat_template of the model used when constructing data in DPOTrainer
#1687 opened 8 days ago by mst272
0
ImportError: cannot import name 'SFTConfig' from 'trl'
#1639 opened a month ago by brand17
3
ImportError: cannot import name 'DPOConfig' from 'trl'
#1642 opened a month ago by AswiniNLP
17
vsft_llava: ValueError: Expected input batch_size (78528) to match target batch_size (41728).
#1685 opened 8 days ago by kishan-character
1
Can we use SFTTrainer for pre-training?
#1657 opened 17 days ago by wennycooper
5
Feature Request: Simple Preference Optimization Integration
#1684 opened 10 days ago by 1485840691-eng
0
Training stops early
#1601 opened a month ago by Techinix
1
CLI utils class cases seem to be incorrect
#1600 opened a month ago by busycalibrating
3
examples don't work
#1656 opened 10 days ago by yechenzhi
3
Seq2SeqTrainer with DataCollatorForCompletionOnlyLM: incorrect masking for evaluation
#1634 opened a month ago by adamamer20
6
SFTrainer with FSDP on a model that doens't fit in GPU memory
#1681 opened 11 days ago by tambulkar
0
[Question]How should I combine DPOTrainer and Accelerate for training?
#1597 opened a month ago by ys-lan
1
FineTuning issue with Gemma-2B-IT model using the SFTTrainer
#1665 opened 16 days ago by AvisP
2
Possible risks in xxxPOTrainer
#1679 opened 11 days ago by AIR-hl
3
SFT does not add the generation prompt to the dataset when using apply_chat_template() automatically.
#1675 opened 11 days ago by DreRnc
0
When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01
#1638 opened a month ago by Minami-su
6
ImportError: cannot import name 'SFTScriptArguments' from 'trl.commands.cli_utils' & --report_to flag
#1669 opened 14 days ago by alielfilali01
1
ppov2 & rloo trainer doesn't correctly compute reward if reward tokenizer is different from policy tokenizer
#1674 opened 12 days ago by TheBlackCat22
0
Path for support generative eval in DPOTrainer
#1671 opened 12 days ago by prompteus
1
OOM with DPO Trainer on A100 GPU
#1667 opened 14 days ago by JhonDan1999
1
Llama 3 Unsloth Fixes
#1668 opened 14 days ago by lhl
0
error when using PPO in Gemma
#1663 opened 16 days ago by mostafamdy
14
Bug in DPO trainning example
#1666 opened 16 days ago by AIR-hl
0
DDPO cannot use SDXL
#1630 opened 17 days ago by mao-code
4
PPOTrainer ignores data_collator keyword argument and uses provided collator inconsistently
#1629 opened 17 days ago by codezakh
1
how to save v_head
#1650 opened 19 days ago by zyzhang1130
2
Adapter name for SFT trainer
#1649 opened 21 days ago by para-zhou
1
Do we need to consider the chat template when doing DPO/KTO training?
#1640 opened a month ago by ZeroYuHuang
2
Set seed
#1648 opened 22 days ago by user799595
1
[enhancement] Implement IRPO training custom loss
#1611 opened a month ago by TheGhoul21
2
Custom DPO Trainer CUDA OOM
#1626 opened a month ago by TheGhoul21
2
ConstantLengthDataset Ignore Some Texts
#1621 opened a month ago by TianyiPeng
1
Long data length cause Cuda ouf of memory when DPO training
#1619 opened a month ago by virt9
1
KTO finetuning - float division by zero
#1651 opened 18 days ago by jetlime
1
DPOTrainer deepspeed.initialize cause ref_model not fixed
#1652 opened 18 days ago by shuoYan97
1
Learning to generate EOS tokens
#1623 opened a month ago by vwxyzjn
6
DPO loss remains 0.6931 and reward is stuck at 0.0
#1627 opened a month ago by virt9
2
ValueError when training on a multi GPU setup and DPO
#1645 opened 24 days ago by miosturu
0
How to do fp16 training with PPOTrainer?
#1614 opened 24 days ago by KwanWaiChung
2
Have trouble in ppo example
#1618 opened a month ago by Shiguang-Guo
4
Wrong prefix for logs in KTOTrainer
#1631 opened a month ago by bartoszzuk
1
How to save and resume a checkpoint from PPOTrainer
#1643 opened a month ago by paraGONG
0
How to use trl\trainer\kto_trainer.py
#1635 opened a month ago by mazhengyufreedom
3
kto error when assign dataset to device
#1620 opened a month ago by mostafamdy
2
Seq2seq model with ppo_trainer samples strange output!
#1633 opened a month ago by sajastu
0
Bug: `tests` are being included in package
#1606 opened a month ago by jamesbraza
1