srzer

Undergraduate @ THU-IIIS. 消极一直好。活着就有希望，一切都是治疗。

Tsinghua UniversityBeijing

Pinned Repositories

Directional-Preference-Alignment
Directional Preference Alignment
46 3 43
Online-RLHF
A recipe for online RLHF and online iterative DPO.
Language:Python384 18 2144
Books
Some special ebooks
00
LaMo-2023
Official code for "Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning".
Language:Python37 1 29
MOD
Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".
Language:Python101
On-convergence-of-GAN
Course project of Game Theory.
Language:Jupyter Notebook0 1 00
Prompting-goal-conditioned-agent
Course project of Natural Language Processing.
Language:Python0 2 00
Samplers-in-Online-DPO
Official code for "The Crucial Role of Samplers in Online Direct Preference Optimization".
Language:Python30
srzer.github.io
My blog
Language:HTML00
Video-SnapShot-based-on-ffmpeg
Course project of Introduction To Programming.
Language:C0 1 00

srzer's Repositories

srzer/LaMo-2023
Official code for "Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning".
Language:Python37 1 29
srzer/MOD
Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".
Language:Python101
srzer/Books
Some special ebooks
00
srzer/On-convergence-of-GAN
Course project of Game Theory.
Language:Jupyter Notebook0 1 00
srzer/Prompting-goal-conditioned-agent
Course project of Natural Language Processing.
Language:Python0 2 00
srzer/srzer.github.io
My blog
Language:HTML00
srzer/Video-SnapShot-based-on-ffmpeg
Course project of Introduction To Programming.
Language:C0 1 00