Pinned Repositories
AccuracyParadox-RLHF
[EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models".
battam1111
Personal Page
FineGrainedRLHF
first-ku
第一个仓库,测试性拉满
MAAC
Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
MCTSV
mujoco-benchmark
MuJoCo benchmark for Deep Reinforcement Learning as provided by Tianshou framework.
OpenHuFu
OpenHuFu is an open-sourced data federation system to support collaborative queries over multi databases with security guarantee.
YJ-MADDPG
YJ-SACR
Battam1111's Repositories
Battam1111/AccuracyParadox-RLHF
[EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models".
Battam1111/MCTSV
Battam1111/YJ-MADDPG
Battam1111/YJ-SACR
Battam1111/battam1111
Personal Page
Battam1111/FineGrainedRLHF
Battam1111/first-ku
第一个仓库,测试性拉满
Battam1111/MAAC
Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
Battam1111/mujoco-benchmark
MuJoCo benchmark for Deep Reinforcement Learning as provided by Tianshou framework.
Battam1111/OpenHuFu
OpenHuFu is an open-sourced data federation system to support collaborative queries over multi databases with security guarantee.
Battam1111/Reinforcement-learning-with-tensorflow
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学