Issues
- 5
reward base model missing
#27 opened by Ritz111 - 1
llava_ppo50k-aokvqa12k-vqa10k.json.数据怎么制作的呢?
#34 opened by Spring24ch - 2
Quetion about reward model's score
#35 opened by DripNowhy - 1
Question about the optimization time
#33 opened by JulioZhao97 - 2
Question About the reward model
#32 opened by tyxiong23 - 1
how to use the reward model isolatedly?
#28 opened by jxgu1016 - 12
- 1
- 1
- 1
Model testing
#26 opened by ernestoBocini - 1
Image Data for RM
#30 opened by ChencongZJU - 1
NotImplementedError in rl_trainer.py
#25 opened by janak11111 - 1
About 'hallucination' in preference dataset
#23 opened by davidluciolu - 1
The accuracy of reward model seem to be low
#24 opened by Wizardcoast - 4
- 1
- 11
The performance of the released ckpt is much lower than the scores reported in the paper
#20 opened by Weiyun1025 - 1
evaluation images missing?
#19 opened by findalexli - 2
Training on RTX 4090
#18 opened by luohaowen2003 - 1
Question about insrtuction data
#17 opened by zhang-jr - 13
Cannot reproduce results
#8 opened by Haoye17 - 6
- 6
- 1
Detailed Results of models on MMHal-Bench
#13 opened by vateye - 1
When will the training codes be released?
#6 opened by feymanpriv - 1
Will the RM be released?
#14 opened by findalexli - 13
RuntimeError: The size of tensor a (577) must match the size of tensor b (257) at non-singleton dimension 1
#11 opened by HarrySSH - 2
- 1
Merge the models
#9 opened by ThierryDeruyttere - 0
Can you use this with 4bit?
#7 opened by ThierryDeruyttere - 2
Images for the SFT dataset
#4 opened by yuvalkirstain - 7
error about call model
#3 opened by LiqiangJing - 1
- 2
Great work! Can I know if there is any implementation or script to call this model? Thanks.
#1 opened by WilTay1 - 1
how to use the model for testing
#2 opened by LiqiangJing