Prinsphield/ELEGANT

Training on Google colab

Closed this issue · 18 comments

I am new to machine learning.
I am trying to train the model on Google colab.
Google colab only available for 12 hours.
So how to resume training
For example if model is saved at 8000 iterations
So how to resume training onwards.

For example, if you want to resume the checkpoint from 8000 iterations, you can
python ELEGANT.py -a Bangs -m train -g 0 -r 8000

Thank u so much for your quick reply. I will try .

After 100 th iteration I got following error
File "ELEGANT.py",line 261,in save_scalar_
log
'loss_D':self.loss_D.data.cpu().numpy()[0]
IndexError :too many indices for array

So after doing search on Google I replaced above code as
loss_D':self.loss_D.data.cpu().item()
loss_G:self.losd_G.data.cpu().item()

And
scalar_info['G_loss/'+key]=value.item()
scalar_info['D_loss/'+key]=value.item()

After this,I was able to train the model till 10,000 iterations.

After that when I tried to resume training 10,000 iterations
It was printing message
Finished training.

But in dataset.py max_iter is 200000
So what's wrong ?

I am not using all the images in celebA dataset.
I am using first 10,000 images and according to it I made changes in attribute and landmark file.

What command did you type?

I used following command as u suggested.
!python ELEGANT.py -m train -a bangs Smiling -g 0 -r 10000

Hello,
U have mentioned image size should be 409687 but img_align_celeba has image size 178218.
So I downloaded img_align.7z .I tried to manually unzip the img_celeba.7z but no luck.file is currupted.
I tried to open it with 7z tool but again getting message can't open archive.

Can u provide any other link to download celeb dataset.

The cropped and aligned images are generated by running preprocess.py. You should download the raw images and preprocess all images using that script.

Raw image folder is corrupted.
Is there other way?
Can I use already cropped and aligned images having size 178*218
In that case preprocessing is not required,right?

No. You have to process raw images using that file.

Ok ,thank u.

Hello,
As u said to use raw images,so I used raw images ,preprocessed them,but again training stuck at 10000 iteration.
When I started to resume training onwards from 10000 iteration , message get printed
Finished training.

I used following command as u said
python ELEGANT.py -a Bangs -m train -g 0 -r 10000

The problem was due to some memory issue.
I resolved it.

I just want to ask u
For the first time while testing the model there is no restore point
So I should keep the argument -r as none,right?

Testing model requires restoring ckpt. Because you have already trained your model.

Ok ,I got it.
Thank u very much

I trained the model on single attribute smiling.
When I was testing the model ,I am getting following warning.

Userwarning:
Volatile was removed and now has no effect.
Use 'with torch.no_grad():' instead
Var =torch.autograd.Variable(tensor,volatile=volatile)

So I replaced volatile flag
With torch.no_grad():
Var=torch.autograd.variable(tensor)

Then I am getting valueError

self.B,self.A=self.tensor2var(self.transform(Image.open(self.args.input),Image.open(self.args.target[0])))
valueError :not enough values to unpack(expected 2,got 0)

Hello

I have resolved that problem

I replaced code like this
def tensor2var (self,tensors,requires_grad=True)
....
....
with torch.no_grad():
var=torch.autograd.variable(tensor)

And where this function is called
I replaced volatile with requires_grad=True

And as I am testing this model on cloud so I wasn't using g argument because of it ,it was throwing valueError

So I used g argument.

I want to ask u can we deploy this model on Android app

If yes can u through some light on it?