Launch container on a GPU server
Closed this issue · 3 comments
Hello,
I am trying to have a jupyter Zapp launched on a GPU server using the labels. I have the docker-engine backend with this entry in the conf file:
[gpu01]
docker_address: xx.yy.zz.tt.gg:2375
external_address: xx.yy.zz.tt.gg
use_tls: yes
labels: gpu
In the Zapp json we have:
.....
"services": [
{
"labels": [
"gpu"
],
"image": xxxxxxx
......
I have set scheduler-class = ZoeElasticScheduler
The node gpu01 is seen online:
2017-12-29 13:32:37,101 INFO synchro_gpu01->zoe_master.backends.docker.threads: Node gpu01 is now online
The Zapp is not starting with this INFO messages:
2017-12-29 13:33:37,227 INFO scheduler->zoe_master.scheduler.simulated_platform: Cannot fit essential service 7 anywhere, bailing out
An important point I had to modify the code of the file zoe_master/backends/docker/threads.py line 78
self.host_stats[host_config.name].labels = set(info['Labels'])
instead of
self.host_stats[host_config.name].labels += set(info['Labels'])
as I was getting this error message:
TypeError: unsupported operand type(s) for +=: 'set' and 'set'
The Zapp is starting well if I remove the labels entry from the json
Can you give me some help I'm stuck, thanks.
Best regards,
Thomas
Hi,
I assumed that sets can be added to perform a union, but I see that you cannot. The line 78 need to be changed liked this:
self.host_stats[host_config.name].labels.union(set(info['Labels']))
By using =
you are throwing away the labels in the config file and keeping only the ones defined by the docker engine.
I will merge a fix soon.
Thanks!
Hi,
Thanks it is working now.