LoRA for Reward Model Training

Question

LoRA for Reward Model Training

Opened this issue 5 months ago · 0 comments

Is your feature request related to a problem? Please describe.
Hi! I am trying to finetune a reward model in the way that HelpSteer2 did, but run into OOM issue.

Then I found LoRA is supported in SFT, but not supported in reward model training. Is it possible to also use LoRA in reward model training as well? I think it is possible given that the reward model is built upon base model.

Also, I am using vMem estimation here, which states a full finetune of a LLaMA 2-7B model in float16 type takes roughly 60GB. However, when I tried to use a 2*A6000 with 48GB vMem each, I got OOM error. Does anyone have a accurate estimation of the memory usage under different model size?