Failed MLPipeline

Question

Failed MLPipeline

franckess opened this issue 3 years ago · 5 comments

I tried to replicate your pipeline in AWS environment, however I get point of failure at MLPipeline step (see screenshoots below).

Looking at the logs via CloudWatch, I can see this error message:

/miniconda3/bin/python _repack_model.py --dependencies  --inference_script transform.py --model_archive s3://mlopsinfrastracturestack-sagemakerconstructsagema-gq7lhy69zuk9/PreprocessData-d7bd2a0ff50809ca886dd3b12220b78a/output/model --source_dir 

Traceback (most recent call last):
  File "_repack_model.py", line 109, in <module>
    model_archive=args.model_archive,
  File "_repack_model.py", line 55, in repack
    shutil.copy2(model_path, local_path)
  File "/miniconda3/lib/python3.7/shutil.py", line 266, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/miniconda3/lib/python3.7/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/training/model'

2022-05-02 04:49:20,281 sagemaker-containers ERROR    Reporting training FAILURE

What am I missing here?

Thank you for you help

Answer 1 · 2022-05-03T10:36:51.000Z

Hi @franckess thank you for reporting this. I will have a look today or tomorrow.

Answer 2 · 2022-05-03T11:51:00.000Z

@jessieweiyi thanks for the prompt reply.

FYI, I am using vscode as my development tool and BitBucket as my repo for CI/CD.

Thanks

Answer 3 · 2022-05-04T09:10:38.000Z

Hi Jessie thanks for putting this together, I'm working with Rene on it, we both get the same result, looks like a problem in sklearn repack step.

Answer 4 · 2022-05-04T10:58:57.000Z

Hi @rdkls , @franckess,

Thank you for the update.

I confirmed that i can reproduce the same error in my side. Working on triaging the issue.

Answer 5 · 2022-05-05T10:02:55.000Z

That's exactly what we found while debugging the error message.

model_data=Join(on='/', values=[step_process.properties.ProcessingOutputConfig.Outputs[
                    "model"].S3Output.S3Uri, "model.tar.gz"]),

@jessieweiyi thanks for fixing the issue.

Have a good one!