baturaysaglam/RIS-MISO-Deep-Reinforcement-Learning

What's the between sum_rate and reward?

Closed this issue · 7 comments

I find the reward is defined as sum rate capacity, but I am confused why the sum rate doesn't equal reward in figures.

can you be more specific on that?

In figure 4, the sum rate is from 5 to 35
image
However, in figure 6, the reward is less than 10. In my opinion, our reward should be sum rate, so max reward should equal sum rate. Can you help me to solve that issue?
image

yes, there's an inconsistency between the two figures. however, note that the used hyperparameters are different for these figures; otherwise, they would've produced the same results. the authors didn't provide any hyperparameter setting for such particular learning curves, and I don't remember which hyperparameter values I used to produce Fig. 6 unfortunately. I've taken a look at the paper, but still couldn't find any information.

please let me know if anything else, and if you find the used hyperparameter values for Fig. 6.

Thank you for your reply. However, there is something weird. In figure 4, when I increase the number of RIS element (N), the result is getting worse, not like what you see in your figure 4. The following is my current figure 4 based on your code.
image

I believe this is expected since you increased the number of users as well. increasing the number of users would degrade the performance.

Thx for your explanation. I have changed the configuration, where the only distinction is the number of RIS element (N) like the following. By the way, I consider the sum rate in figure 4 may be the opt_reward rather than current_reward. opt_reward is the SNR rather than SINR. In that case, we will get larger sum rate.

image

yes, this is what's expected. when you increase the number of RIS elements, you'd obtain more transmission power as well as effective performance.

regarding the SNR/SINR, thank you for pointing this out, but I'm not the author of the paper, so I only tried to reproduce the figures more or less the same. authors didn't provide much detail such as which objective (as a reward) they used, hyperparameter settings, etc.