eric-mitchell/direct-preference-optimization

Inference code example?

Closed this issue · 2 comments

Hey @eric-mitchell, thanks for the repo.

Could you provide any example code for loading/using the trained models for inference? I'm trying to reproduce your Dialogue GPT-4 win rate experiment.

Thanks in advance!

To load the models for inference, just create whatever model you fine-tuned from with model = transformers.AutoModelForCausalLM.from_pretrained() and then load the SFT or DPO weights with model.load_state_dict(torch.load(YOUR_ARCHIVE_PATH)).

Re: GPT-4 evals, we'll see if we can share something shortly showing how we did things. The prompt construction will just depend on how you constructed prompts in your dataset for training.

What would the YOUR_ARCHIVE_PATH file be called? In the folder that is generated after training with DPO, I only see optimizer.pt, policy.pt, and scheduler.pt