A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trick
Primary LanguagePythonMIT LicenseMIT
No issues in this repository yet.