awslabs/ml-io

Error when using mlio.SageMakerPipe

Closed this issue · 3 comments

I'm launching a sklearn training container with this call:
sklearn_estimator.fit({'train': 's3://...')

Following couple reads at the AWS doc and the ml-io doc, I attempt to read the data inside the container with that code:

pipe = mlio.SageMakerPipe('/opt/ml/input/data/train')
...etc

However it creates this error:
mlio.DataReaderError: The data store '/opt/ml/input/data/train' does not exist.

How are we supposed to write the data channel details into the SageMakerPipe configuration?

The AWS doc you linked indicates /opt/ml/input/data/training - any chance that's the issue?

I don't think so? I'll give a try but basically those substrings are variable and defined by the name of the channel:

launching the call with model.fit({'mychannel':'s3://...'}) will create a local foldel opt/ml/input/data/mychannel in File Mode and several pipes /opt/ml/input/data/mychannel_N with N numbering the number of times the channel has been read so far (so first pipe is named /opt/ml/input/data/mychannel_0) in Pipe Mode. At least that's my understanding of the doc :) and this logic is definitely in place in file mode

Actually, Pipe mode may not work in local mode (which I was using). Now on a c5.9xl remote training I have a different error :) which makes this issue solved.