Pinned Repositories
bairblog.github.io
Arxiv-Recommender
SPAG
Self-playing Adversarial Language Game Enhances LLM Reasoning
alpha-zero-general
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
llm2vec
pairwise-proximal-policy-optimization
reward_exp
spag
Self-playing Adversarial Language Game Enhances LLM Reasoning
trlx
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
visual-tool
thwu1's Repositories
thwu1/pairwise-proximal-policy-optimization
thwu1/trlx
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
thwu1/visual-tool
thwu1/alpha-zero-general
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
thwu1/llm2vec
thwu1/reward_exp
thwu1/spag
Self-playing Adversarial Language Game Enhances LLM Reasoning