tryolabs/luminoth

gcloud.py

Closed this issue · 4 comments

There appears to be a small error in the gcloud.py file in the luminoth/tools/cloud folder.When we do not give a bucket argument for storing logs, this error is triggered. Line 226 contains
bucket_name = 'luminoth-{}'.formata(account.client_id)
I adjusted that to
bucket_name = 'luminoth-{}'.format(account.client_id)
and it seems to work fine now.

When I run lumi cloud gc jobs, I get
Id: train_20181101_174123 Created: 2018-11-01T21:41:30Z State: FAILED sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('192.168.1.4', 55354), raddr=('108.177.8.95', 443)> sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('192.168.1.4', 59232), raddr=('172.217.195.95', 443)>``

I am unsure if this happened because of the previous change. However, running the original code resulted in
File "/home/ace/luminoth/luminoth/tools/cloud/gcloud.py", line 226, in train bucket_name = 'luminoth-{}'.formata(account.client_id)
AttributeError: 'str' object has no attribute 'formata'

Also Luminoth ran on my computer without any problems, so I am unsure why there is failure here. The bucker where my dataset tf records were uploaded to is us-east1 while the bucket where it the logs would have been stored is in US.

Hi @AshwinAce!

Thanks for your report. This is a legit typo, and I have fixed it here 4b81238.

As for the warning, is it only a warning? Does the job not get submitted to ML Engine?
What version of Python are you using?

The job gets submitted, it runs for a while and then it fails. I tried executing it another time with the same results. I am using Python 3.6.5.

One possible thing I did which might be a problem was that I tried changing num_epochs inside the config.yml file. I'm not sure whether that change worked, however it still runs in my laptop while failing in the cloud.

If the job gets submitted successfully and does start, it means it's not failing because of this warning, so lumi cloud gc worked :D

Now, you must investigate why the job itself fails. This is a different issue, so I'm closing this. It will be helpful to look at the logs of the job in ML Engine; you can use lumi cloud gc logs for that.