techandy42/LLM_Reward_Model
Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.
Jupyter Notebook
Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.
Jupyter Notebook