godka/Pensieve-PPO
The simplest implementation of Pensieve (SIGCOMM' 17) via state-of-the-art RL algorithms, including PPO, DQN, SAC, and support for both TensorFlow and PyTorch.
DIGITAL Command LanguageBSD-2-Clause
Issues
- 0
- 1
What do the dimensions of the state returned by the environment mean? What are the corresponding parameters in the paper?
#21 opened by 945716994 - 1
有最新关于这方面的进展的Paper吗?我看Pensieve论文里说,没有多少改进空间了。
#20 opened by 945716994 - 1
Does it run on Windows?
#19 opened by 945716994 - 4
a2c vs ppo NN architecture
#13 opened by ahmad-hl - 1
- 2
How to improve exploration?
#11 opened by ahmad-hl - 5
Monitor cross-validation curve
#8 opened by ahmad-hl - 2
SAC import error
#9 opened by ahmad-hl - 3
setting entropy to TD_loss summary vars
#10 opened by ahmad-hl - 8
a question about compute_v
#4 opened by linnaeushuang - 6
- 1
The time to train the model
#7 opened by SoonyangZhang - 5
- 1
TQL
#1 opened by xiaxiaxiahhh