awarebayes/RecNN

Why DDPG always takes the same action

Closed this issue · 4 comments

The project you did is so great!
But when I use ddpg(not yours) to train, it will fall into some fixed value , I want to know how you solved this problem.I will be grateful if you can help me.

Hmm, maybe the return is the best? What sort of action? Does it hit the upper bound of tanh (1)? How does the action distribution (histograms) hold up?
Would like to know further details about dimensionality, activation, and maybe the problem
Anyways
I always debug things like:

  1. Pairwise distances (euclidian, cosine, inner)
  2. Means and STDs for each action dimension
  3. Histograms for action dims
  4. Kurtosis, skewness and other stats
    This is easy with Pytorch tensorboard API:
from torch.utils.tensorboard import SummaryWriter

def pairwise_distances_fig(embs):
    embs = embs.detach().cpu().numpy()
    similarity_matrix_cos = distance.cdist(embs, embs, 'cosine')
    similarity_matrix_euc = distance.cdist(embs, embs, 'euclidean')

    fig = plt.figure(figsize=(16,10))

    ax = fig.add_subplot(121)
    cax = ax.matshow(similarity_matrix_cos)
    fig.colorbar(cax)
    ax.set_title('Cosine')
    ax.axis('off')

    ax = fig.add_subplot(122)
    cax = ax.matshow(similarity_matrix_euc)
    fig.colorbar(cax)
    ax.set_title('Euclidian')
    ax.axis('off')

    fig.suptitle('Action pairwise distances')
    plt.close()
    return fig

 writer = SummaryWriter(log_dir='./runs')

# first 50 action similarities
 writer.add_figure('next_action', pairwise_distances_fig(next_action[:50]), step)
 for i, param in enumerate(next_action):
    writer.add_histogram(f'param_{i}', param, step)
    writer.add_scalar(f'param_{i}_mean', param.mean(), step)
    writer.add_scalar(f'param_{i}_std', param.mean(), step)

why it hit the upper bound of tanh (1)?

No idea, try removing tanh all together for learning
Or see what happens before activation

Any update?