A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trick
Primary LanguagePythonMIT LicenseMIT
No one’s star this repository yet.