Kaixhin/dockerfiles

Cannot checkout torch with cuda 8 using tag

mpeniak opened this issue · 8 comments

$ nvidia-docker run -it kaixhin/cuda-torch:8.0
Tag 8.0 not found in repository docker.io/kaixhin/cuda-torch

Hey Martin, actually the CUDA builds have been failing for a while because building cutorch causes the automated builds to time out. So now I've split up the dependencies into "dependencies" and Torch minus CUDA, and then try to luarocks install cutorch. Still times out on that one step though - if you know of a solution that would solve this issue it'd be really appreciated!

I'm having the same issue with my torch-docker. Have you checked with the folks at Docker? The time-out is way to narrow and it can't be what was originally intended by the good people at Docker.

@gforge It's a combination of the resource limits and cutorch having to build for every architecture (as it doesn't know what kind of GPU the code has to run on in the future). Docker Support is now paid-only, so you can't even raise a ticket directly with them without paying.

Is there a way to limit cutorch for the nvidia-docker?

I haven't had the time to investigate, but let me know if you find a solution! Best place to start is to see if there's anything relevant in the cutorch issues, and possibly raise an issue yourself.

While waiting for a fix of the automatic docker build arround the cutorch issue, couldn't you push to your repository a manually built cuda-torch:8.0 image?

@aurelien-coquard good point - I've done this now, and I will work on getting up-to-date builds of 7.5 online asap (deprecating 6.5 and 7.0). With decreasing service for free users with Docker, and even less resources on CI systems like Travis, automated builds (at least for Torch w/ CUDA) look to be out. I'd like a free, automated solution, but as a first step I'll just get all versions online manually.

Thx @Kaixhin!

FYI, on our project, following the exemple of weldr/welder-web, we plan to build the docker image within travis-ci, using the docker repository as a cache (using --cache-from docker build options), and push the updated image to the docker repo.

See PR!375 on OpenNMT project (where we use your cuda-torch Docker image).