HCDM/BanditLib

How to calculate the reward of every epoch?Will it make sense?

Closed this issue · 1 comments

I have read the code, and I got a question that described as the title. There was an variable called coTheta which is used to calculated the reward. Anyone who know why?
Any help is appreciated.

Hi,
If users are independent in the environment, theta should be used to calculate the reward. But in a collaborative environment, CoTheta, which considers the user connection matrix W, should be used to calculate the environment.

So to make it more general, we decided to use CoTheta to compute the reward and at the same time controls W to support both independent and collaborative environment. In other words, if you need an environment, in which users are independent, we only need to set W to be identical matrix. And in this case, it is equivalent to use theta to compute reward.

Feel free to add more comments if there is still confusion.

Thanks!