embed_mean grow to inf
yaojunr opened this issue · 2 comments
Hello, i have added CausalNormClassifier to my own project, and recorded embed mean. However, i meet the problem that the embed mean grow to inf.
I print torch.sum(embed_mean)
tensor(1902.6863, device='cuda:0')
Train: 2%| | 1/45 [00:07<05:36, 7.65s/it]
tensor(8754.8096, device='cuda:0')
Train: 4%| | 2/45 [00:06<03:03, 4.27s/it]
tensor(33422.0312, device='cuda:0')
Train: 7%| | 3/45 [00:08<02:47, 3.99s/it]
tensor(122225.3984, device='cuda:0')
the embed_mean is updated as follows:
self.embed_mean = torch.zeros(int(self.training_opt['feature_dim'])).numpy()
self.embed_mean = self.mu * self.embed_mean + self.features.detach().mean(0).view(-1).cpu().numpy()
During the train process, the gradient will be small and small, so the the velocity will not grow to inf. But the feature which generated by model may not be small and small, so it seems to grow to inf. So i can't store this variable. How can i solve this problem? Or anything i miss?
The momentum form of moving average will scale the magnitude of embed_mean 1 / (1 - mu) times, e.g, if your mu is 0.99, your embed_mean may 100 times larger than feature vectors. You can change the moving average function to:
self.embed_mean = self.mu * self.embed_mean + (1 - self.mu) * self.features.detach().mean(0).view(-1).cpu().numpy()
This function won't scale up the embed_mean, and should works the same.
With your help, i find the bug lies in my code of DDP. After i fix it, the embed_mean is stable. Thank you.