wadx2019/rpo

problems about OPF environment

Closed this issue · 1 comments

hi, I'm a little confused about the definition of OPF environment. As pointed out in paper, you take the voltage magnitude and argument of the 14 as the action space. But I think these variables are calculated based on power flow and cannot be controlled by the agent. Looking forward to your answer,thanks!

That is actually the advantage of RPO. As you said, in previous works, we need to first predict the pg, vg, and then calculate other decision variables using power flow. In that case, the power flow calculation is isolated from the RL training, which increases the unnecessary burden on the learning of the Q-function and the management of feasible constraints. This is because the power flow calculation is not differentiable. For example, the Q-function must learn this latent mapping relationship if you have reactive power cost.

Actually, we compare RPO with the method you mentioned in appendix. If you have interest, you can refer to section E.3.

Hope it can help you. Thanks!