Rewards decrease in late training
Closed this issue · 11 comments
How to solve the problem of reward decrease in reinforcement learning DDPG algorithm in the later stage of training
How do you solve the problem of phase sum squared as one, which has been bothering me for a long time
How to solve the problem of reward decrease in reinforcement learning DDPG algorithm in the later stage of training
you can cut off the training earlier.
How do you solve the problem of phase sum squared as one, which has been bothering me for a long time
what do you mean by that? can you elaborate more on that?
the sum of that square of the real and imaginary parts is one accord to Euler's formula
How do you solve the problem of phase sum squared as one, which has been bothering me for a long time
what do you mean by that? can you elaborate more on that?
Your algorithm cannot guarantee that the sum of the squares of the real and imaginary parts of a reflector element is one, so the premise that your modulus is one is not satisfied
How do you solve the problem of phase sum squared as one, which has been bothering me for a long time
what do you mean by that? can you elaborate more on that?
Your algorithm cannot guarantee that the sum of the squares of the real and imaginary parts of a reflector element is one, so the premise that your modulus is one is not satisfied
are you sure? simply print the phase part of the normalized action before returning it in DDPG.py, i.e., right before line 74:
print(self.compute_phase((a / division_term).detach()))
I get unit modulus all the time. For example:
Time step: 105 Episode Num: 1 Reward: 1.291
(tensor([[1.]], device='cuda:0'), tensor([[1.]], device='cuda:0'))
(tensor([[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000]], device='cuda:0'), tensor([[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000]], device='cuda:0'))
(tensor([[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000]], device='cuda:0'), tensor([[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000],
[1.0000]], device='cuda:0'))
Time step: 106 Episode Num: 1 Reward: 1.281
(tensor([[1.]], device='cuda:0'), tensor([[1.0000]], device='cuda:0'))
Phi [[-0.4658936-0.61511719j 0. -0.j ]
[ 0. -0.j -0.2412132-0.09198956j]],I tested the output, the phi matrix is this, how do you make sure that 0.24 squared plus 0.09 squared is one? 0.46 Square plus 0.61 square is not one too
can you provide the code for calculating the results you got?
can you provide the code for calculating the results you got?
def step(self, action):
self.episode_t += 1
action = action.reshape(1, -1)
G_real = action[:, :self.M ** 2]
G_imag = action[:, self.M ** 2:2 * self.M ** 2]
Phi_real = action[:, -2 * self.L:-self.L]
Phi_imag = action[:, -self.L:]
self.G = G_real.reshape(self.M, self.K) + 1j * G_imag.reshape(self.M, self.K)
self.Phi = np.eye(self.L, dtype=complex) * (Phi_real + 1j * Phi_imag)
print("Phi", self.Phi),114 lines in the environment file .
you should compute the norm of the phase shifts as follows:
def step(self, action):
self.episode_t += 1
action = action.reshape(1, -1)
G_real = action[:, :self.M ** 2]
G_imag = action[:, self.M ** 2:2 * self.M ** 2]
Phi_real = action[:, -2 * self.L:-self.L]
Phi_imag = action[:, -self.L:]
modulus = np.sum(np.abs(Phi_real)).reshape(-1, 1) * np.sqrt(2), np.sum(np.abs(Phi_imag)).reshape(-1, 1) * np.sqrt(2)
print(modulus)
I described here why.
you should compute the norm of the phase shifts as follows:
def step(self, action): self.episode_t += 1 action = action.reshape(1, -1) G_real = action[:, :self.M ** 2] G_imag = action[:, self.M ** 2:2 * self.M ** 2] Phi_real = action[:, -2 * self.L:-self.L] Phi_imag = action[:, -self.L:] modulus = np.sum(np.abs(Phi_real)).reshape(-1, 1) * np.sqrt(2), np.sum(np.abs(Phi_imag)).reshape(-1, 1) * np.sqrt(2) print(modulus)
I described here why.
I'm sorry, I don't understand why you have to add everything up. Each original phase should satisfy Euler's equation, right?