Why DDPG always takes the same action

Question

Why DDPG always takes the same action

Closed this issue 5 years ago · 4 comments

The project you did is so great!
But when I use ddpg(not yours) to train, it will fall into some fixed value , I want to know how you solved this problem.I will be grateful if you can help me.

Answer 1 · 2019-10-12T19:13:49.000Z

Hmm, maybe the return is the best? What sort of action? Does it hit the upper bound of tanh (1)? How does the action distribution (histograms) hold up?
Would like to know further details about dimensionality, activation, and maybe the problem
Anyways
I always debug things like:

Pairwise distances (euclidian, cosine, inner)
Means and STDs for each action dimension
Histograms for action dims
Kurtosis, skewness and other stats
This is easy with Pytorch tensorboard API:

from torch.utils.tensorboard import SummaryWriter

def pairwise_distances_fig(embs):
    embs = embs.detach().cpu().numpy()
    similarity_matrix_cos = distance.cdist(embs, embs, 'cosine')
    similarity_matrix_euc = distance.cdist(embs, embs, 'euclidean')

    fig = plt.figure(figsize=(16,10))

    ax = fig.add_subplot(121)
    cax = ax.matshow(similarity_matrix_cos)
    fig.colorbar(cax)
    ax.set_title('Cosine')
    ax.axis('off')

    ax = fig.add_subplot(122)
    cax = ax.matshow(similarity_matrix_euc)
    fig.colorbar(cax)
    ax.set_title('Euclidian')
    ax.axis('off')

    fig.suptitle('Action pairwise distances')
    plt.close()
    return fig

 writer = SummaryWriter(log_dir='./runs')

# first 50 action similarities
 writer.add_figure('next_action', pairwise_distances_fig(next_action[:50]), step)
 for i, param in enumerate(next_action):
    writer.add_histogram(f'param_{i}', param, step)
    writer.add_scalar(f'param_{i}_mean', param.mean(), step)
    writer.add_scalar(f'param_{i}_std', param.mean(), step)

Answer 2 · 2019-10-13T02:14:31.000Z

why it hit the upper bound of tanh (1)?

Answer 3 · 2019-10-13T07:24:03.000Z

No idea, try removing tanh all together for learning
Or see what happens before activation

Answer 4 · 2019-10-17T13:38:08.000Z

Any update?