lvdmaaten/bhtsne

Document the "gains"

Closed this issue · 2 comments

The computations involving the "gains" in tsne.cpp, line 72 carry the awe-inspiring comment

// Allocate some memory

This is not just "some memory". These are parts of a computation that is critical for the implementation to work properly. Neither the paper nor any of the copycat implementations have any information about what these "gains" are. Maybe it's obvious for those who are more deeply involved. But nevertheless, at some point, it should be explained what these "gains" actually are.

The gains correspond to the following sentence in the t-SNE paper: "The learning rate η is initially set to 100 and it is updated after every iteration by means of the adaptive learning rate scheme described by Jacobs (1988)." The Jacobs paper is: R.A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1:295–307, 1988.

The idea is to have parameter-dependent learning rates. If the sign of the gradient w.r.t. a parameter doesn't change, we slowly increase the learning rate for that parameter. If the sign switches, we rapidly reduce the learning rate for that parameter. The gains contain the parameter-dependent learning rate corrections.

Thanks, I somehow expected that. (The Jacobs paper does not seem to be available publicly - at least, I didn't find it - but maybe I can circumvent some of the paywalls from my office).