godka/Pensieve-PPO

The simplest implementation of Pensieve (SIGCOMM' 17) via state-of-the-art RL algorithms, including PPO, DQN, SAC, and support for both TensorFlow and PyTorch.

DIGITAL Command LanguageBSD-2-Clause

Issues

Way to replicate baseline results for different data sets
#26 opened 2 years ago by kaczor3213
0
What do the dimensions of the state returned by the environment mean? What are the corresponding parameters in the paper?
#21 opened 3 years ago by 945716994
1
有最新关于这方面的进展的Paper吗？我看Pensieve论文里说，没有多少改进空间了。
#20 opened 3 years ago by 945716994
1
Does it run on Windows?
#19 opened 3 years ago by 945716994
1
a2c vs ppo NN architecture
#13 opened 3 years ago by ahmad-hl
4
A question about the comparing function "r" between new and old policies
#16 opened 3 years ago by youngboy52
1
How to improve exploration?
#11 opened 3 years ago by ahmad-hl
2
Monitor cross-validation curve
#8 opened 3 years ago by ahmad-hl
5
SAC import error
#9 opened 3 years ago by ahmad-hl
2
setting entropy to TD_loss summary vars
#10 opened 3 years ago by ahmad-hl
3
a question about compute_v
#4 opened 4 years ago by linnaeushuang
8
Is this project based on python3？and which version of tf2.x?
#6 opened 4 years ago by youngboy52
6
The time to train the model
#7 opened 4 years ago by SoonyangZhang
1
Training with Multiple videos with random number of bitrates masked.
#3 opened 4 years ago by manojsoni2
5
TQL
#1 opened 5 years ago by xiaxiaxiahhh
1