Implementation of drc

Question

Implementation of drc

FarnazAdib opened this issue 3 years ago · 4 comments

Hi

Thanks for providing this interesting package.

I am trying to test drc on a simple setup and I notice that the current implementation of drc does not work. I mean when I try it for a simple partially observable linear system with
A = np.array([[1.0 0.95], [0.0, -0.9]]),
B = np.array([[0.0], [1.0]])
C = np.array([[1.0, 0]])
Q , R = I
gaussian process noise, zero observation noise
which is open loop stable, the controller acts like a zero controller. I tried to get a different response by setting the hyperparameters but they are mostly the same.
Then I looked at the implementation at the deluca github and I noticed that the counterfactual cost is not defined correctly (if I am not wrong). According to Algorithm 1 in [1], we need to use M_t to compute y_t (which depends on the previous controls (u) using again M_t) but in the implementation, the previous controls based on M_{t-i} are used. Anyway, I implemented the algorithm using M_t but what I get after the simulation is either close to zero control or an unstable one.

I was wondering if you have any code example for the DRC algorithm that works?
[1] Simchowitz, Max and Singh, Karan and Hazan, Elad, "Improper learning for non-stochastic control", COLT 2020.

Thanks a lot,
Sincerely,
Farnaz

Answer 1 · 2021-08-25T18:38:43.000Z

Farnaz: Hello! Thanks for checking in. The implementation in deluca is a sample implementation and not in line with the paper you mentioned. I asked one of the authors and there is not a publicly available implementation. Apologies! Daniel

…

On Wed, Aug 25, 2021 at 5:40 AM FarnazAdib ***@***.***> wrote: Hi Thanks for providing this interesting package. I am trying to test drc on a simple setup and I notice that the current implementation of drc does not work. I mean when I try it for a simple partially observable linear system with A = np.array([[1.0 0.95], [0.0, -0.9]]), B = np.array([[0.0], [1.0]]) C = np.array([[1.0, 0]]) Q , R = I gaussian process noise, zero observation noise which is open loop stable, the controller acts like a zero controller. I tried to get a different response by setting the hyperparameters but they are mostly the same. Then I looked at the implementation at the deluca github and I noticed that the counterfactual cost is not defined correctly (if I am not wrong). According to Algorithm 1 in [1], we need to use M_t to compute y_t (which depends on the previous controls (u) using again M_t) but in the implementation, the previous controls based on M_{t-i} are used. Anyway, I implemented the algorithm using M_t but what I get after the simulation is either close to zero control or an unstable one. I was wondering if you have any code example for the DRC algorithm that works? [1] Simchowitz, Max and Singh, Karan and Hazan, Elad, "Improper learning for non-stochastic control", COLT 2020. Thanks a lot, Sincerely, Farnaz — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#47>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAURLVVEH77B7ELJZIFOV4LT6S3ABANCNFSM5CYVHC6A> .

Answer 2 · 2021-08-26T06:45:30.000Z

Hi Daniel,

Thank you very much for your response.

In your paper "Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking", it is mentioned that DRC is an implementation of [1]. Anyway, you said it is a ``sample implementation'' of DRC, so you have probably tried your implementation on something. I was wondering if I can have that one?

Thank you very much for your time and help!
Best regards,
Farnaz

Answer 3 · 2021-08-26T12:29:57.000Z

Hi: As I mentioned, I reached out to the authors and they told me they don’t have the implementation. Perhaps you can try contacting them?

…

On Thu, Aug 26, 2021 at 2:45 AM Farnaz ***@***.***> wrote: Hi Daniel, Thank you very much for your response. In your paper "Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking", it is mentioned that DRC is an implementation of [1]. Anyway, you said it is a ``sample implementation'' of DRC, so you have probably tried your implementation on something. I was wondering if I can have that one? Thank you very much for your time and help! Best regards, Farnaz — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#47 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAURLVRHWJEMVE6OBRXZX7DT6XPJLANCNFSM5CYVHC6A> .

Answer 4 · 2021-08-27T06:50:01.000Z

Ok.
I don't follow up this point anymore.