haoxing-du/compulsive-lm
Playing with RL fine-tuning of large language models. What if I fine tune the model to always output a certain word?
Python
Playing with RL fine-tuning of large language models. What if I fine tune the model to always output a certain word?
Python