A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trick
Primary LanguagePythonMIT LicenseMIT
No one’s watching this repository yet.