tatsu-lab/alpaca_farm

[Discussion] about compute_logprobs

snowkcon opened this issue 2 years ago · 0 comments

snowkcon commented 2 years ago

alpace_farm implementation https://github.com/tatsu-lab/alpaca_farm/blob/94b02079b74af731b2671e3691a5080d5d340fd8/src/alpaca_farm/models/rl_models.py#L97C30-L97C46
DeepSpeedExamples implementation https://github.com/microsoft/DeepSpeedExamples/blob/f9c3ae057102376388c3e416ab2f33392c56ec6d/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L40

Which would be a better implementation?
alpace_farm implementation may have negative values