something about the parameters (mu and sigma ) of Gaussian distribution
KangyaHe opened this issue · 6 comments
Hi ~
I have read your paper "DEEP GAUSSIAN EMBEDDING OF GRAPHS: UNSUPERVISED INDUCTIVE LEARNING VIA RANKING". To capture the uncertainty, the Gaussian distribution is proposed. Here I have some question about the parameters computation in your code "./g2g/model.py"
"
.....
W_mu = tf.get_variable(name='W_mu', shape=[sizes[-1], self.L], dtype=tf.float32, initializer=w_init())
b_mu = tf.get_variable(name='b_mu', shape=[self.L], dtype=tf.float32, initializer=w_init())
self.mu = tf.matmul(encoded, W_mu) + b_mu
W_sigma = tf.get_variable(name='W_sigma', shape=[sizes[-1], self.L], dtype=tf.float32, initializer=w_init())
b_sigma = tf.get_variable(name='b_sigma', shape=[self.L], dtype=tf.float32, initializer=w_init())
log_sigma = tf.matmul(encoded, W_sigma) + b_sigma
self.sigma = tf.nn.elu(log_sigma) + 1 + 1e-14
.....
"
-
The first is about the parameters 'mu'. In your code the "mu" was calculated by adding a external layer in the neural network. Is that right? If it is true, would you mind providing me some reference about this process. I am just wandering why the 'mu' can be calculated like this.
-
The second is about the parameters 'sigma'. It was calculated just like the 'mu' as mentioned above. The 'sigma' should be a Covariance matrix with "self.L \times self.L" dimension, but in the code above, it was a vector with dimension "self.L"
I am beginner about the network representation, if there is some misunderstanding about your work, please figure is out . I would appreciate it. Thanks a lot!
Hi,
Thank you for your interest in our paper.
-
That's right. I'm not sure I completely understand your question, so let me rephrase our goal. Essentially, for a node i, we want to parametrize the mean of its Gaussian distribution mu_i as a function of its attributes. That is, mu_i = f_theta(x_i), where f could be any function with some trainable parameters theta. You can choose any choice for f_theta, but in this work we let f_theta be a simple feedforward neural network.
-
It's true that in general, the covariance matrix Sigma is an L x L matrix. However, for computational convenience we use a diagonal covariance matrix, that is, we have potentially non-zero values only on the diagonal and zeros everywhere else. Therefore, we only need to parametrize the diagonal of Sigma, which is an L dimensional vector. This has the added benefit of simplifying the computation of the KL divergence since its easy to invert a diagonal matrix.
Let me know if you have any more questions.
Good idea. So in your model, you assumed that the parameters of the Gaussian distribution can be learned by the neural network. Is that right? So if there is no other information, how did you know the final output was the right mu and sigma of the Gaussian distribution, maybe we can call them 'latent representation' or something else. So my question is that how can you prove the sigma_i, mu_i =f_theta(x_i) is the parameters of the Gaussian distribution but not something else.
Exactly. We learn the parameters of the Gaussian distribution.
We want mu and Sigma to be valid parameters. For mu there is no restriction on what it can be. Sigma on the other hand has to be a PSD matrix. For a diagonal Sigma we can easily enforce this by requiring the elements on the diagonal to be positive.
OK
Let me have a conclusion. The main idea of the work is that the uncertainty of each nodes can be represented by a specific Gaussian model. Then if we can learn the right parameters of the model, then they would satisfy the "pairwise constraints" in the "Dissimilarity part".
A:"the parameters of Gaussian model would satisfy the pairwise constraints "
B: "if the parameters satisfy the pairwise constraints, then they are the parameters of the Gaussian model "
It can be seen that the mu and sigma are learned by the constraints. That means the idea was B. As far as I know, B may not set up.
I'm not sure what you mean by "B may not set up".
Of course, it could be true that it is not even possible for all pairwise constraints to be satisfied (or we might need a very large dimensionallity to do so), e.g. a scenario where for any setting of the parameters there exist some constraints that are violated. This is a very interesting research question but we don't have any results on this.
We also don't claim that parameters satisfying pairwise constraints => parameters of a Gaussian. A Gaussian distribution is simply one convenient choice that captures uncertainty better than a point estimate.
I got it. Thank you very much.