Update: This implementation is not finished and I will look to finish it once I have more time on my hand.
This is a chatbot project for the course D7058E at Luleå Univeristy of Technology. We try to implement something similar to Instruct-GPT or Chat-GPT mostly based on the papers and the rlhf blogpost from Huggingface.
- Implement PPO2 for faster RL fine-tuning.
- Implement the website that is partially done to gather real human data.
- Upload reward model and fine-tuned model to Huggingface for open source use.