lucidrains/self-rewarding-lm-pytorch

usage demo is not working

Opened this issue · 0 comments

Hi, thanks for sharing this work, I was trying to revive your result, started from your usage demo. However , I found candidate response are all something like this :
截屏2024-12-03 19 37 00

I am thinking about it might because of the token_decoder you're using, the model;s output is 256 , but the ASCII vocab has only 127 words. I am not sure what kind of decoder I should use , when I tried to use a model which has 128 num_tokens , the SFT process because wrong
截屏2024-12-03 19 41 21
截屏2024-12-03 19 41 50

could you please help me out about it?