Silent Failure if custom image puts something into /opt/ml/code
Opened this issue · 0 comments
njbrake commented
Hi, I was making a new Docker image for training:
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04
COPY src/requirements.txt /opt/ml/code/requirements.txt
RUN pip install --no-cache-dir -r /opt/ml/code/requirements.txt
And I found that when I do that, my training image could no longer find the files that usually get copied in when the container runs. I traced it back to this line, which checks if the /opt/ml/code folder exists, and if it exists at all it just skips the step that copies over the sourcedir.tar.gz file from that URI.
Should the logic be changed so that it doesn't skip downloading the file, or maybe at least it should give a warning that it's skipping the download?