[Question] Tutorial 11, Variational dequantization and log jacobian calculation
mjack3 opened this issue · 6 comments
Tutorial: -11
Describe the bug
Could you explain the origin of the numbers in the Jacobian determinant for Dequantization and Variational Dequantization?
Cell 6:
class Dequantization(nn.Module):
def __init__(self, alpha=1e-5, quants=256):
"""
Inputs:
alpha - small constant that is used to scale the original input.
Prevents dealing with values very close to 0 and 1 when inverting the sigmoid
quants - Number of possible discrete values (usually 256 for 8-bit image)
"""
super().__init__()
self.alpha = alpha
self.quants = quants
def forward(self, z, ldj, reverse=False):
if not reverse:
z, ldj = self.dequant(z, ldj)
z, ldj = self.sigmoid(z, ldj, reverse=True)
else:
z, ldj = self.sigmoid(z, ldj, reverse=False)
z = z * self.quants
ldj += np.log(self.quants) * np.prod(z.shape[1:])
z = torch.floor(z).clamp(min=0, max=self.quants-1).to(torch.int32)
return z, ldj
def sigmoid(self, z, ldj, reverse=False):
# Applies an invertible sigmoid transformation
if not reverse:
ldj += (-z-2*F.softplus(-z)).sum(dim=[1,2,3])
z = torch.sigmoid(z)
# Reversing scaling for numerical stability
ldj -= np.log(1 - self.alpha) * np.prod(z.shape[1:])
z = (z - 0.5 * self.alpha) / (1 - self.alpha)
else:
z = z * (1 - self.alpha) + 0.5 * self.alpha # Scale to prevent boundaries 0 and 1
ldj += np.log(1 - self.alpha) * np.prod(z.shape[1:])
ldj += (-torch.log(z) - torch.log(1-z)).sum(dim=[1,2,3])
z = torch.log(z) - torch.log(1-z)
return z, ldj
def dequant(self, z, ldj):
# Transform discrete values to continuous volumes
z = z.to(torch.float32)
z = z + torch.rand_like(z).detach()
z = z / self.quants
ldj -= np.log(self.quants) * np.prod(z.shape[1:])
return z, ldj
Specially the case for
ldj += (-z-2*F.softplus(-z)).sum(dim=[1,2,3])
I have basic knowledge of Jacobian (I studied computer science). The tutorial is great, but I would appreciate a little introduction to why the Jacobian is handled this way.
Thanks in advance
Hi, sure happy to expand on it. The Jacobian you are looking at there comes from the sigmoid
The derivative of the sigmoid is commonly known as
(see e.g. https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e for the steps). Now we can plug it into the ldj equation and solve the log:
Combining them gives us:
The second part is also known as the softplus function (https://pytorch.org/docs/stable/generated/torch.nn.Softplus.html), and PyTorch provides a numerical stable version of it. Thus, our final ldj becomes:
This is what we implement, and sum over all dimensions except the batch since we apply the sigmoid to all elements in the image.
Hope that helps, let me know if something is unclear. :)
Oh, thanks! And could you explain this operation?
ldj -= np.log(self.quants) * np.prod(z.shape[1:])
I know you're subtracting the ldj because you're dividing z by 256. Right? but why do you use the product?
Correct, the np.log(self.quants)
is because of the division. We do this division for every element in the batch, which is for an image height * width * channels
. This is why we take the product over these axes.
You can also imagine that we would have a tensor of size [batch, channels, height, width]
, all with values np.log(self.quants)
for the division. Then we sum over the last three axes, as in the previous ldj calculation. This is equivalent to the product
Oh! I see thank you very much! And to end this questionnaire, please could you extend on this operations:
z = z * (1 - self.alpha) + 0.5 * self.alpha # Scale to prevent boundaries 0 and 1
ldj += np.log(1 - self.alpha) * np.prod(z.shape[1:])
I understand why we need to avoid limits. But, when calculating the Jacobian in such a transformation I wonder why only the (1 - self.alpha)
is considered.
Thank you so much!
The Jacobian is based on the derivative of the transformation. The term 0.5 * self.alpha
represents here an additive constant, so it does not influence the derivative. In other words, you have:
Thanks! can close the issue =)