Pinned Repositories
Directional-Preference-Alignment
Directional Preference Alignment
Online-RLHF
A recipe for online RLHF and online iterative DPO.
Books
Some special ebooks
LaMo-2023
Official code for "Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning".
MOD
Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".
On-convergence-of-GAN
Course project of Game Theory.
Prompting-goal-conditioned-agent
Course project of Natural Language Processing.
Samplers-in-Online-DPO
Official code for "The Crucial Role of Samplers in Online Direct Preference Optimization".
srzer.github.io
My blog
Video-SnapShot-based-on-ffmpeg
Course project of Introduction To Programming.
srzer's Repositories
srzer/LaMo-2023
Official code for "Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning".
srzer/MOD
Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".
srzer/Books
Some special ebooks
srzer/On-convergence-of-GAN
Course project of Game Theory.
srzer/Prompting-goal-conditioned-agent
Course project of Natural Language Processing.
srzer/srzer.github.io
My blog
srzer/Video-SnapShot-based-on-ffmpeg
Course project of Introduction To Programming.