pumpikano/tf-magnet-loss

Can't not reproduce your result on mnist using lasagne

jiqiujia opened this issue · 4 comments

I reimplement the magnet loss according to your code and parameter setting of mnistEncoder, but I can't reproduce your result using lasagne. Are there a big difference between detailed implementation of tensorflow and lasagne?

I notice that you didn't add weight decay loss. How could this network work without weight decay regularization?

For debugging, I recommend feeding some random data and centroids into just the loss function (i.e. skip the encoding network for now). You should be able to produce very close value for the loss on the same data in both implementations. This tricky part with this loss is to be sure the right squared distances are taking part in the right terms of the loss expression. In particular, the numerator is an expression of intracluster distances only and the denominator is an expression of interclass distances.

The network does not seem to require regularization, at least for MNIST, though it might certainly help. regularization could be added as another term in the final loss.

Hope this helps, and good luck!

I am sorry I still cannot find out what's wrong with my implementation. My code is nearly the same as yours except that I turn the tensorflow operation into lasagne/theano's. Would you mind take some time to see if there's something wrong with following code. Thank you very much.

def magnet_loss(r, classes, clusters, cluster_classes, n_clusters, alpha=1.0):
    def comparison_mask(a_labels, b_labels):
        return T.eq(a_labels.reshape((-1, 1)), b_labels.reshape((1, -1)))
    N = r.shape[0]
    # Take cluster means within the batch
    cluster_means, _ = theano.scan(lambda i, r, clusters: T.mean(r[T.eq(clusters, i)], 0),
                  sequences=np.arange(n_clusters), non_sequences=[r, clusters])
    cluster_means = T.stack(cluster_means) 

    # Compute squared distance of each example to each cluster centroid
    sample_costs = ((cluster_means - r.dimshuffle((0, 'x', 1)))**2).sum(2) 

    # Select distances of examples to their own centroid
    intra_cluster_mask = comparison_mask(clusters, np.arange(n_clusters))
    intra_cluster_costs = T.sum(intra_cluster_mask * sample_costs, 1)

    # Compute variance of intra-cluster distances
    variance = T.sum(intra_cluster_costs) / (N - 1.0)
    var_normalizer = -1.0 / (2 * variance**2)

    # Compute numerator
    numerator = T.exp(var_normalizer * intra_cluster_costs - alpha)

    # Compute denominator
    diff_class_mask = T.neq(classes.reshape((-1, 1)), cluster_classes.reshape((1, -1)))
    denom_sample_costs = T.exp(var_normalizer * sample_costs)
    denominator = T.sum(diff_class_mask * denom_sample_costs, 1)

    # Compute example losses and total loss
    epsilon = 1e-8
    losses = T.maximum((-T.log(numerator / (denominator + epsilon) + epsilon)), 0)
    total_loss = T.mean(losses)

    return total_loss, losses

I know what's wrong with code now. the issue is with T.eq. Theano doesn't support boolean type so the return type of T.eq is int8. As a result, boolean indexing doesn't work here.
Btw, debugging is really tough in theano.
After all, thanks for your help.