[Discussion] about compute_logprobs
snowkcon opened this issue · 0 comments
snowkcon commented
- alpace_farm implementation https://github.com/tatsu-lab/alpaca_farm/blob/94b02079b74af731b2671e3691a5080d5d340fd8/src/alpaca_farm/models/rl_models.py#L97C30-L97C46
- DeepSpeedExamples implementation https://github.com/microsoft/DeepSpeedExamples/blob/f9c3ae057102376388c3e416ab2f33392c56ec6d/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L40
Which would be a better implementation?
alpace_farm implementation may have negative values