Model Initialized outsize [w_min, w_max]
Zhaoxian-Wu opened this issue · 7 comments
Description
When I am training an analog component, I find the weight of the analog layer can fall out the range [w_min, w_max]
, where w_min
and w_max
are the parameters of PulseDevice
How to reproduce
from aihwkit.nn import AnalogLinear
from aihwkit.simulator.configs import SingleRPUConfig
from aihwkit.simulator.configs.devices import SoftBoundsReferenceDevice
device = SoftBoundsReferenceDevice(
construction_seed=10,
w_max = 0.1,
w_min = -0.1
)
rpu_config = SingleRPUConfig(device=device)
model = AnalogLinear(1, 1, False, rpu_config=rpu_config)
print(f'w_max: {rpu_config.device.w_max}, w_min: {rpu_config.device.w_min}, weight: {model.get_weights()[0].item()}')
The output is
(analog) zhaoxian@server:~/Desktop/$ python main.py
w_max: 0.1, w_min: -0.1, weight: 0.1632751077413559
Here the weight 0.16
is larger than w_max
, which really confuses me. Do I miss something like a mapping operator? And what is the exact meaning of the parameters w_min
and w_max
?
Expected behavior
The weight should be smaller than 0.1
Other information
- Pytorch version: 2.1.2+cu121
- Package version: 0.8.0
- OS: Ubuntu 20.04.2
- Python version: Python 3.10
- Conda version (or N/A): conda 23.10.0
You need to use get_weights(apply_weight_scaling=False) to get the restricted weights.
Actually apart from the weight scaling which might change the effective max weight, even if no weight scaling is used the w_max parameter is the mean maximal weight across devices. There is device to device variation, too, controlled by w_max_dtod. The actual W max per synapse can be read out with analog_tile.get_hidden_parameters(). See also the Api documentation to for these parameters where it is described in more details.
@kaoutar55 this issue should be converted to a discussion as it is not a bug.
Actually apart from the weight scaling which might change the effective max weight, even if no weight scaling is used the w_max parameter is the mean maximal weight across devices. There is device to device variation, too, controlled by w_max_dtod. The actual W max per synapse can be read out with analog_tile.get_hidden_parameters(). See also the Api documentation to for these parameters where it is described in more details.
I set both dtod
and apply_weight_scaling=False
but the same thing happens
from aihwkit.nn import AnalogLinear
from aihwkit.simulator.configs import SingleRPUConfig
from aihwkit.simulator.configs.devices import SoftBoundsReferenceDevice
device = SoftBoundsReferenceDevice(
construction_seed=10,
w_max = 0.1,
w_min = -0.1,
w_max_dtod=0,
w_min_dtod=0,
)
rpu_config = SingleRPUConfig(device=device)
model = AnalogLinear(1, 1, False, rpu_config=rpu_config)
weight = model.get_weights(apply_weight_scaling=False)[0].item()
print(f'w_max: {rpu_config.device.w_max}, w_min: {rpu_config.device.w_min}, weight: {weight}')
I got the result
(analog) zhaoxian@server:~/Desktop/$ python tmain.py
w_max: 0.1, w_min: -0.1, weight: -0.10000000149011612
The weight is still slightly larger than 0.1
Actually, I set all the parameters I knew ideally but I still got the same thing. The following is a more detailed version
import math
import torch
import torch.nn as nn
from aihwkit.nn import AnalogLinear
from aihwkit.optim import AnalogSGD
from aihwkit.simulator.configs import (
SingleRPUConfig,
WeightNoiseType,
NoiseManagementType,
BoundManagementType,
)
from aihwkit.simulator.configs.devices import SoftBoundsReferenceDevice
from aihwkit.simulator.parameters import IOParameters
# ==================
INPUT_SIZE = 1
def get_loss(model, for_training=True, analog_exact=True):
criterion = nn.MSELoss()
outputs = model(torch.ones(INPUT_SIZE))
loss = criterion(outputs.view(-1), torch.tensor([0.5]))
return loss
def get_IO():
io_param = IOParameters(
is_perfect = True,
inp_bound = 10,
out_bound = 10,
w_noise = 0,
w_noise_type = WeightNoiseType.NONE,
inp_noise = 0.,
out_noise = 0.,
inp_res = 0,
out_res = 0,
ir_drop = 0,
ir_drop_g_ratio=0,
noise_management = NoiseManagementType.NONE,
bound_management = BoundManagementType.NONE,
v_offset_w_min = 0,
)
return io_param
device = SoftBoundsReferenceDevice(
construction_seed=10,
dw_min = 1e-4,
dw_min_dtod = 0,
dw_min_std = 0,
dw_min_dtod_log_normal=False,
write_noise_std = 0,
corrupt_devices_prob = 0,
corrupt_devices_range = 0,
up_down = 0,
up_down_dtod = 0,
slope_up_dtod = 0,
slope_down_dtod = 0,
reference_mean = 0,
reference_std = 0,
w_max = 0.1,
w_min = -0.1,
w_min_dtod = 0,
w_max_dtod = 0,
reset_std = 0,
subtract_symmetry_point = True,
perfect_bias = True,
)
rpu_config = SingleRPUConfig(device=device)
rpu_config.forward = get_IO()
rpu_config.backward = get_IO()
rpu_config.update.desired_bl = 5000
rpu_config.update.sto_round = True
model = AnalogLinear(INPUT_SIZE, 1, False, rpu_config=rpu_config)
torch.manual_seed(618)
optimizer = AnalogSGD(model.parameters(), lr=0.1)
print(f'w_max: {rpu_config.device.w_max}, w_min: {rpu_config.device.w_min}')
for iter_idx in range(10):
weight = model.get_weights(apply_weight_scaling=False)[0].item()
print(f'iteration {iter_idx}, weight: {weight}')
model.eval()
loss = get_loss(model)
optimizer.zero_grad()
loss.backward()
optimizer.step()
optimizer.zero_grad()
Running it I got the output
(analog) zhaoxian@server:~/Desktop/$ python main.py
w_max: 0.1, w_min: -0.1
iteration 0, weight: 0.2317507266998291
iteration 1, weight: 0.28406578302383423
iteration 2, weight: 0.3259482979774475
iteration 3, weight: 0.3595556616783142
iteration 4, weight: 0.38660740852355957
iteration 5, weight: 0.4084051549434662
iteration 6, weight: 0.42603760957717896
iteration 7, weight: 0.44019657373428345
iteration 8, weight: 0.45166146755218506
iteration 9, weight: 0.46091893315315247
The weight still falls outside the region.
Turn off subtract symmetry point, is perfect and perfect bias and also show the results of get_hidden_paramters()
Why are you setting the model to eval? Also you should make sure that you are not initially setting the weights too high. Softbound synapse might not clip the weights if you set it initially wrong.
Also turn off weight scaling, that is (in mapping) weight scaling omega = 0 out scaling False