Guo-Jian-Wang/colfi

AttributeError: 'OneBranchMLP' object has no attribute 'use_multiGPU

Closed this issue · 5 comments

Hi Guo-Jian_Wang!

I am Suresh Parekh. Recently, I discovered your library colfi for cosmological parameter estimation. I was trying to install it and try it out. After installation, when I run the test code as mentioned on colfi.readthedocs.io, I get an error when I run the following block

nde_type = 'MNN'

chain_n = 3
num_train = 500
epoch = 2000
num_vali = 100
base_N = 500

predictor = nde.NDEs(sim_data, model, param_names, params_dict=params_dict, cov_matrix=None,
init_chain=None, init_params=init_params, nde_type=nde_type,
num_train=num_train, num_vali=num_vali, local_samples=None, chain_n=chain_n)
predictor.base_N = base_N
predictor.epoch = epoch
predictor.fiducial_params = [a_fid, b_fid]
predictor.randn_num = randn_num
predictor.fast_training = True

predictor.train(path='test_linear')

Error I encounterd is:
AttributeError: 'OneBranchMLP_G' object has no attribute 'use_multiGPU'


AttributeError Traceback (most recent call last)
in <cell line: 23>()
21 # predictor. = False
22
---> 23 predictor.train(path='test_linear')

2 frames
/usr/local/lib/python3.10/dist-packages/colfi/models_g.py in train(self, repeat_n, showEpoch_n, fast_training)
270 self.optimizer.param_groups[0]['lr'] = lrdc.exp()
271
--> 272 if self.use_multiGPU:
273 self.net = self.net.module.cpu()
274 else:

AttributeError: 'OneBranchMLP_G' object has no attribute 'use_multiGPU'

Later, in the model_g.py, I added a line under the class OneBranchMLP_G to debug the error:

self.use_multiGPU = False

But now it gave one more error and I am unable to figureout how to run this code.
Error

AttributeError Traceback (most recent call last)
Cell In[14], line 23
20 predictor.fast_training = False
21 # predictor. = False
---> 23 predictor.train(path='test_linear')

File ~/.local/lib/python3.10/site-packages/colfi/nde.py:769, in NDEs.train(self, path, save_items, showEpoch_n)
767 self._train = eval("self.train%s"%nde_type_i)
768 print('\nThe neural density estimator used here is %s'%nde_type_i)
--> 769 chain_1, losses = self._train(self.train_set, self.vali_set, step=step, burnInEnd=burnInEnd, burnInEnd_step=self.burnInEnd_step,
770 randn_num=randn_nums[step-1], save_items=save_items, showEpoch_n=showEpoch_n, params_space=prev_space)
771 #works for MDN in the case None ann chain obtained, see also models_ann --> auto_train
772 retrain_n = 0

File ~/.local/lib/python3.10/site-packages/colfi/nde.py:619, in NDEs._train_MNN(self, train_set, vali_set, step, burnInEnd, burnInEnd_step, randn_num, save_items, showEpoch_n, params_space)
616 exec('self.eco.%s = value'%(key))
618 if self.branch_n==1:
--> 619 self.eco.train(repeat_n=self.repeat_n, showEpoch_n=showEpoch_n, fast_training=self.fast_training)
620 # self.eco.train_AvgMultiNoise(repeat_n=self.repeat_n, showEpoch_n=showEpoch_n) #need further test
621 else:
622 self.eco.train(repeat_n=self.repeat_n, train_branch=self.train_branch, parallel=False, showEpoch_n=showEpoch_n, fast_training=self.fast_training) #reset parallel???

File ~/.local/lib/python3.10/site-packages/colfi/models_g.py:256, in OneBranchMLP_G.train(self, repeat_n, showEpoch_n, fast_training)
254 #vali_loss
...
--> 256 self.inputs_vali, self.target_valids.AddGaussianNoise(self.obs_vali,params=self.params_vali,obs_errors=self.obs_errors,cholesky_factor=self.cholesky_factor,noise_type=self.noise_type,factor_sigma=self.factor_sigma,multi_noise=self.multi_noise,use_GPU=self.use_GPU).multiNoisySample(reorder=True)
257 self.inputs_vali = self.normalize_obs(self.inputs_vali, self.obs_base_torch)
258 self.target_vali = self.normalize_params(self.target_vali, self.params_base_torch)

AttributeError: 'OneBranchMLP_G' object has no attribute 'inputs_vali'

Can you please help me to debug and run this code?

Hi Suresh Parekh,

Thanks for using the colfi code. Are you installing it using pip install colfi? If so, please re-install from the GitHub source since the use_multiGPU issue is fixed in the GitHub version. You can do it by using:

git clone https://github.com/Guo-Jian-Wang/colfi.git
cd colfi
sudo python setup.py install

https://github.com/Guo-Jian-Wang/colfi/blob/master/examples/linear/train_linear.py, this code works well on my side.

Best regards

Hi Guo-Jian_Wang!

I am Suresh Parekh. Recently, I discovered your library colfi for cosmological parameter estimation. I was trying to install it and try it out. After installation, when I run the test code as mentioned on colfi.readthedocs.io, I get an error when I run the following block

nde_type = 'MNN'

chain_n = 3 num_train = 500 epoch = 2000 num_vali = 100 base_N = 500

predictor = nde.NDEs(sim_data, model, param_names, params_dict=params_dict, cov_matrix=None, init_chain=None, init_params=init_params, nde_type=nde_type, num_train=num_train, num_vali=num_vali, local_samples=None, chain_n=chain_n) predictor.base_N = base_N predictor.epoch = epoch predictor.fiducial_params = [a_fid, b_fid] predictor.randn_num = randn_num predictor.fast_training = True

predictor.train(path='test_linear')

Error I encounterd is: AttributeError: 'OneBranchMLP_G' object has no attribute 'use_multiGPU'

AttributeError Traceback (most recent call last) in <cell line: 23>() 21 # predictor. = False 22 ---> 23 predictor.train(path='test_linear')

2 frames /usr/local/lib/python3.10/dist-packages/colfi/models_g.py in train(self, repeat_n, showEpoch_n, fast_training) 270 self.optimizer.param_groups[0]['lr'] = lrdc.exp() 271 --> 272 if self.use_multiGPU: 273 self.net = self.net.module.cpu() 274 else:

AttributeError: 'OneBranchMLP_G' object has no attribute 'use_multiGPU'

Later, in the model_g.py, I added a line under the class OneBranchMLP_G to debug the error:

self.use_multiGPU = False

But now it gave one more error and I am unable to figureout how to run this code.

Error
AttributeError Traceback (most recent call last) Cell In[14], line 23 20 predictor.fast_training = False 21 # predictor. = False ---> 23 predictor.train(path='test_linear')

File ~/.local/lib/python3.10/site-packages/colfi/nde.py:769, in NDEs.train(self, path, save_items, showEpoch_n) 767 self._train = eval("self.train%s"%nde_type_i) 768 print('\nThe neural density estimator used here is %s'%nde_type_i) --> 769 chain_1, losses = self._train(self.train_set, self.vali_set, step=step, burnInEnd=burnInEnd, burnInEnd_step=self.burnInEnd_step, 770 randn_num=randn_nums[step-1], save_items=save_items, showEpoch_n=showEpoch_n, params_space=prev_space) 771 #works for MDN in the case None ann chain obtained, see also models_ann --> auto_train 772 retrain_n = 0

File ~/.local/lib/python3.10/site-packages/colfi/nde.py:619, in NDEs._train_MNN(self, train_set, vali_set, step, burnInEnd, burnInEnd_step, randn_num, save_items, showEpoch_n, params_space) 616 exec('self.eco.%s = value'%(key)) 618 if self.branch_n==1: --> 619 self.eco.train(repeat_n=self.repeat_n, showEpoch_n=showEpoch_n, fast_training=self.fast_training) 620 # self.eco.train_AvgMultiNoise(repeat_n=self.repeat_n, showEpoch_n=showEpoch_n) #need further test 621 else: 622 self.eco.train(repeat_n=self.repeat_n, train_branch=self.train_branch, parallel=False, showEpoch_n=showEpoch_n, fast_training=self.fast_training) #reset parallel???

File ~/.local/lib/python3.10/site-packages/colfi/models_g.py:256, in OneBranchMLP_G.train(self, repeat_n, showEpoch_n, fast_training) 254 #vali_loss ... --> 256 self.inputs_vali, self.target_valids.AddGaussianNoise(self.obs_vali,params=self.params_vali,obs_errors=self.obs_errors,cholesky_factor=self.cholesky_factor,noise_type=self.noise_type,factor_sigma=self.factor_sigma,multi_noise=self.multi_noise,use_GPU=self.use_GPU).multiNoisySample(reorder=True) 257 self.inputs_vali = self.normalize_obs(self.inputs_vali, self.obs_base_torch) 258 self.target_vali = self.normalize_params(self.target_vali, self.params_base_torch)

AttributeError: 'OneBranchMLP_G' object has no attribute 'inputs_vali'

Can you please help me to debug and run this code?

Hi Suresh Parekh, I have fixed this bug, please re-install colfi and run examples in it. Thanks.

Hi Guo Jian Wang,

Thank you very much for your help! It worked! All 5 steps were completed successfully. Later during the plotting part, it generated an error with the coplot library. As again you are the developer of the same library, I thought to address it here.

While running the last block of the code,

predictor.get_steps() predictor.get_contour() predictor.get_losses() plt.show()

In the predictor.get_steps() it is giving me the following error:

######################### step 5/5 #########################
Updating parameter space to be learned ...
Learning range of a: [-1.715130, 5.051654] ~ [-5.0\sigma, +5.0\sigma]
Learning range of b: [2.259633, 2.733966] ~ [-5.0\sigma, +5.0\sigma]

Selecting samples from the mock data of the previous step ...
600 sets of samples of previous step are added to the training set

The neural density estimator used here is ANN

Training the network using CPU
The batch size is set to 500
Fast training the network, the iteration per epoch is 3, the batch size is 2500!
randn_num: 5.1157
(epoch:100/2000; train_loss/vali_loss:0.06154/0.10735; lr:0.00504661)
(epoch:200/2000; train_loss/vali_loss:0.06315/0.06557; lr:0.00252930)
(epoch:300/2000; train_loss/vali_loss:0.06230/0.06236; lr:0.00126765)
(epoch:400/2000; train_loss/vali_loss:0.05717/0.06067; lr:0.00063533)
(epoch:500/2000; train_loss/vali_loss:0.05969/0.06169; lr:0.00031842)
(epoch:600/2000; train_loss/vali_loss:0.06049/0.06456; lr:0.00015959)
(epoch:700/2000; train_loss/vali_loss:0.05932/0.05931; lr:0.00007998)
(epoch:800/2000; train_loss/vali_loss:0.06138/0.06170; lr:0.00004009)
(epoch:900/2000; train_loss/vali_loss:0.06129/0.06237; lr:0.00002009)
(epoch:1000/2000; train_loss/vali_loss:0.05914/0.06034; lr:0.00001007)
(epoch:1100/2000; train_loss/vali_loss:0.06080/0.06008; lr:0.00000505)
(epoch:1200/2000; train_loss/vali_loss:0.06011/0.05646; lr:0.00000253)
(epoch:1300/2000; train_loss/vali_loss:0.05919/0.05886; lr:0.00000127)
(epoch:1400/2000; train_loss/vali_loss:0.06066/0.06446; lr:0.00000064)
(epoch:1500/2000; train_loss/vali_loss:0.06431/0.06513; lr:0.00000032)
(epoch:1600/2000; train_loss/vali_loss:0.06104/0.06275; lr:0.00000016)
(epoch:1700/2000; train_loss/vali_loss:0.06736/0.06881; lr:0.00000008)
(epoch:1800/2000; train_loss/vali_loss:0.06497/0.05629; lr:0.00000004)
(epoch:1900/2000; train_loss/vali_loss:0.06159/0.06288; lr:0.00000002)
(epoch:2000/2000; train_loss/vali_loss:0.06116/0.06004; lr:0.00000001)
The directory "test_linear/chain_ann" is successfully created !

Time elapsed for the training process: 6.385 minutes
Traceback (most recent call last):
File "/home/suresh/colfi/examples/linear/train_linear.py", line 83, in
predictor.get_steps()
File "/usr/local/lib/python3.10/dist-packages/colfi-0.6.0-py3.10.egg/colfi/plotter.py", line 311, in get_steps
File "/home/suresh/.local/lib/python3.10/site-packages/coplot/plots.py", line 315, in plot
pls.PlotSettings().setting(location=[self.lon_n, self.lat_n, i*self.lat_n+j+1], lims=data['lims'], labels=data['labels'],
File "/home/suresh/.local/lib/python3.10/site-packages/coplot/plot_settings.py", line 109, in setting
tick.label.set_rotation(rotation)
AttributeError: 'XTick' object has no attribute 'label'. Did you mean: '_label'?

Can you please suggest what changes shall I make?

Hi Guo Jian Wang,

Thank you very much for your help! It worked! All 5 steps were completed successfully. Later during the plotting part, it generated an error with the coplot library. As again you are the developer of the same library, I thought to address it here.

While running the last block of the code,

predictor.get_steps() predictor.get_contour() predictor.get_losses() plt.show()

In the predictor.get_steps() it is giving me the following error:

######################### step 5/5 ######################### Updating parameter space to be learned ... Learning range of a: [-1.715130, 5.051654] ~ [-5.0\sigma, +5.0\sigma] Learning range of b: [2.259633, 2.733966] ~ [-5.0\sigma, +5.0\sigma]

Selecting samples from the mock data of the previous step ... 600 sets of samples of previous step are added to the training set

The neural density estimator used here is ANN

Training the network using CPU The batch size is set to 500 Fast training the network, the iteration per epoch is 3, the batch size is 2500! randn_num: 5.1157 (epoch:100/2000; train_loss/vali_loss:0.06154/0.10735; lr:0.00504661) (epoch:200/2000; train_loss/vali_loss:0.06315/0.06557; lr:0.00252930) (epoch:300/2000; train_loss/vali_loss:0.06230/0.06236; lr:0.00126765) (epoch:400/2000; train_loss/vali_loss:0.05717/0.06067; lr:0.00063533) (epoch:500/2000; train_loss/vali_loss:0.05969/0.06169; lr:0.00031842) (epoch:600/2000; train_loss/vali_loss:0.06049/0.06456; lr:0.00015959) (epoch:700/2000; train_loss/vali_loss:0.05932/0.05931; lr:0.00007998) (epoch:800/2000; train_loss/vali_loss:0.06138/0.06170; lr:0.00004009) (epoch:900/2000; train_loss/vali_loss:0.06129/0.06237; lr:0.00002009) (epoch:1000/2000; train_loss/vali_loss:0.05914/0.06034; lr:0.00001007) (epoch:1100/2000; train_loss/vali_loss:0.06080/0.06008; lr:0.00000505) (epoch:1200/2000; train_loss/vali_loss:0.06011/0.05646; lr:0.00000253) (epoch:1300/2000; train_loss/vali_loss:0.05919/0.05886; lr:0.00000127) (epoch:1400/2000; train_loss/vali_loss:0.06066/0.06446; lr:0.00000064) (epoch:1500/2000; train_loss/vali_loss:0.06431/0.06513; lr:0.00000032) (epoch:1600/2000; train_loss/vali_loss:0.06104/0.06275; lr:0.00000016) (epoch:1700/2000; train_loss/vali_loss:0.06736/0.06881; lr:0.00000008) (epoch:1800/2000; train_loss/vali_loss:0.06497/0.05629; lr:0.00000004) (epoch:1900/2000; train_loss/vali_loss:0.06159/0.06288; lr:0.00000002) (epoch:2000/2000; train_loss/vali_loss:0.06116/0.06004; lr:0.00000001) The directory "test_linear/chain_ann" is successfully created !

Time elapsed for the training process: 6.385 minutes Traceback (most recent call last): File "/home/suresh/colfi/examples/linear/train_linear.py", line 83, in predictor.get_steps() File "/usr/local/lib/python3.10/dist-packages/colfi-0.6.0-py3.10.egg/colfi/plotter.py", line 311, in get_steps File "/home/suresh/.local/lib/python3.10/site-packages/coplot/plots.py", line 315, in plot pls.PlotSettings().setting(location=[self.lon_n, self.lat_n, i*self.lat_n+j+1], lims=data['lims'], labels=data['labels'], File "/home/suresh/.local/lib/python3.10/site-packages/coplot/plot_settings.py", line 109, in setting tick.label.set_rotation(rotation) AttributeError: 'XTick' object has no attribute 'label'. Did you mean: '_label'?

Can you please suggest what changes shall I make?

Hi Suresh Parekh,

This might be caused by the version of matplotlib; please update your matplotlib; it works well on my side with both matplotlib=3.4.3 and matplotlib=3.7.2.

BTW. Please use coplot=0.1.3 if you are using a higher version.

Hi Guo Jian Wang,

Thanks a lot for the help! Yes, changing the version of matplotlib and coplot worked. Now it's working perfectly! Thank you very much.