IBM/MAX-Object-Detector

Geo support other then US-South for training

hemaAI opened this issue · 11 comments

I tried your training readme.md with us-south region for ml instance and buckets. But when I change it to eu-de I run into errors. Other regions can be very important as some projects can have geo restrictions.

Hi @HenrikMatzen, you can also download the files from our content delivery network. I've opened a PR to update the readme.

@HenrikMatzen The link should work now. Feel free to try again

I think this was a miss understanding. When you follow your instructions at https://github.com/IBM/MAX-Object-Detector/tree/b1104edf14ee21f8bdfd8c1d02071a425da01939/training and setup a ML instance in geo us-south the training starts successfully. But when you create a ML instance in eu-de you run into following error:
watson_machine_learning_client.wml_client_error - WARNING - Failure during training. (POST https://eu-de.ml.cloud.ibm.com/v3/models) Status code: 500, body: { "trace": "_", "errors": [{ "code": "training_submit_error", "message": "0", "more_info": "http://watson-ml-api.mybluemix.net/" }] } Error. Model training could not be started: Failure during training. (POST https://eu-de.ml.cloud.ibm.com/v3/models) Status code: 500, body: { "trace": "_", "errors": [{ "code": "training_submit_error", "message": "0", "more_info": "http://watson-ml-api.mybluemix.net/" }] }

@HenrikMatzen thanks for reporting this, we will look into it and revert ASAP.

cc @ptitzler @SSaishruthi @kmh4321

I guess this is the faulty line:

endpoint: https://s3.us.cloud-object-storage.appdomain.cloud

I guess this is the faulty line:

endpoint: https://s3.us.cloud-object-storage.appdomain.cloud

I already changed it. To overcome the geo issue for buckets I first needed to change
https://github.com/IBM/MAX-Training-Framework/blob/master/max_training_framework/utils/cos.py#L39

This creates the .zip but then I get the error described in my post above.

Hmm... Any idea we can remove the redundancy in storing these URLs?

@HenrikMatzen I'll take a look what it'd take to support a) cross regional and b) regional buckets in different geographies. Trying to remember why we couldn't get regional buckets (which eu-de is) to work in the first place and therefore settled on cross regional support for the first release...

IBM/MAX-Training-Framework#12

Thanks a lot. Now we have the confirmation that the machine learning service does not support DLaaS in EU-DE. Not sure how to deal which such an issue. I think as you cannot change it we should close it?

Short analysis:

  • The Watson Machine Learning Service on IBM Cloud only supports deep learning in the us-south and eu-gb location. Therefore you will not be able to train this model in your desired location.

That said, there is still value in supporting model training in the eu-gb Watson Machine Learning location (and storing data in COS instances that are geographically close), so we are going to put the required fixes in, even though this does not solve your particular issue.

If you'd like you can close the issue. We'll make sure to document the limitation.

It is also possible to use a regional bucket like eu-de in the yaml. The packaging with the provided scripts inside the training folder wont work by default but you can comment out the section that tries to read the bucket information.