initialization of PHM layers
dorooddorood606 opened this issue · 4 comments
Hi
I would like to initialize phm_rule and weights in a way that the final weight matrix of PHM layers is initialized with normal(mean=0, std=0.01), could you kindly provide me with some suggestions on how this can be achieved? So which initialization I can use for phm_rules and weight variables.
thanks
Hi @dorooddorood606 , you can achieve such initialization with the following code:
from benchmarks.utils import set_seed_all
from phc.hypercomplex.layers import PHMLinear
import torch
# Initialize the final weight matrix following a certain distribution
device = "cuda:0" if torch.cuda.is_available() else "cpu"
set_seed_all(seed=43)
phm_lin1 = PHMLinear(in_features=128 // 2, out_features=256 // 2, phm_dim=4, w_init="phm", c_init="standard").to(device)
for w in phm_lin1.W:
w.data.normal_(mean=0.0, std=0.01)
for w in phm_lin1.W:
print(w.std())
# tensor(0.0100, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0101, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0099, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0099, device='cuda:0', grad_fn=<StdBackward0>)
If you want to modify the phm_rules, you can iterate over phm_lin1.phm_rules
and retrieve the data
attribute, like:
for w in phm_lin1.phm_rule:
w.data.normal_(mean=0.5, std=0.1)
for w in phm_lin1.phm_rule:
print(w)
# Parameter containing:
# tensor([[0.6034, 0.5514, 0.4601, 0.7307],
# [0.5802, 0.4613, 0.4960, 0.6374],
# [0.6922, 0.5066, 0.5063, 0.4360],
# [0.5713, 0.3694, 0.5513, 0.4803]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.3592, 0.5751, 0.5850, 0.5287],
# [0.4716, 0.4622, 0.5230, 0.5109],
# [0.4808, 0.3467, 0.5735, 0.5904],
# [0.4408, 0.5532, 0.5885, 0.5192]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.3816, 0.6542, 0.3359, 0.4211],
# [0.6865, 0.3759, 0.5291, 0.5276],
# [0.6018, 0.5565, 0.4768, 0.6355],
# [0.5029, 0.5969, 0.6655, 0.3873]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.5919, 0.5583, 0.3676, 0.5180],
# [0.5897, 0.3686, 0.4941, 0.6941],
# [0.6832, 0.6234, 0.3679, 0.2792],
# [0.4790, 0.4572, 0.4511, 0.5616]], device='cuda:0', requires_grad=True)
Hi
Thank you for the response. Sorry for the misunderstanding. What I meant was if we could intialize the components of phm_rule and W in PHM layers in a way that final weight matrix which approximates the linear layer be close to normal(mean=0, std=0.01)
inialization. So lets assume we compute the H = \sum_i(phm_i \odot W_i) how can we have H initalized as normal by initalizing phm_i and W_i elements. thanks a lot for any suggestions in advance
Hi @dorooddorood606 , I need to think more about how we can formulate this problem, to get a precise initialization scheme, but you could start with the following code and test out different std
for the W
tensor, i.e., the weight-matrices.
import torch
from benchmarks.utils import set_seed_all
from phc.hypercomplex.layers import PHMLinear
from phc.hypercomplex.kronecker import kronecker_product_einsum_batched
set_seed_all(42)
phm_dim = 4
in_feats = 256
out_feats = 256
in_feats_axis = in_feats // phm_dim
out_feats_axis = out_feats // phm_dim
# fix this (corresponds to the phm-rules, i.e., the C_i in the paper
C = torch.randn(phm_dim, phm_dim, phm_dim).normal_(0, 0.1)
# try out here
W = torch.randn(phm_dim, in_feats_axis, out_feats_axis).normal_(0, 0.05)
H = kronecker_product_einsum_batched(C, W)
HH = H.sum(0)
print(HH.mean())
print(HH.std())
# tensor(2.9075e-06)
# tensor(0.0087)
If you found an approximate std
for initializing the W_i matrices, then you can use the code I sent you earlier, to init the W-matrices. As of now, the standard deviation for the phm-rules (C_i) are fixed initialized with standard deviation 0.1 -
Generally, the final standard deviation for the H-matrix (after sum of Kronecker products, i.e. in the code, the HH
object) can be computed by computing the standard deviation of the vectorized version of the sum of Kronecker products. But I need to think more about it and write down the equations. I hope this solution helps you, so you can at least try out, and if not, even get the right answer from it by using my hint.
thanks a lot