A simple end to end Image Classification (preprocessing, training, deployment) using AWS Sagemaker.
NOTE : Using pipeline is in different repo.
-
Upload data file in .zip (data source https://www.kaggle.com/datasets/puneet6060/intel-image-classification) to S3.
s3://mlops-tutorials/sagemaker-mlops1/intel/
-
Create a repo in AWS CodeCommit for DVC
-
Run
setup-git-dvc
notebook -
Run
data-reprocessing
notebook -
Run
train
notebook -
Run
deployAndPredict
notebook -
Save JSON file in
SM_MODEL_DIR
directory
# save eval metrics in json file
trainer.test(model, datamodule)
eval_metrics = { k : v.tolist() for k, v in trainer.callback_metrics.items() }
print(eval_metrics)
with open((sm_model_dir/"eval_metrics.json"), "w") as jfile:
json.dump(eval_metrics, jfile)
- Check Cloud Watch logs
Top 5 predictions for each image.
Create Custom Docker Image
for Sagemaker Jobs using AWS Deep Learning Containers as base.
- Install additional dependencies. (E.g. I want to install a specific Python library, that the current SageMaker containers don't install.)
- Configure your environment. (E.g. I want to add an environment variable to my container.)
SageMakerFullAccess
and AmazonEC2ContainerRegistryFullAccess
to create the custom docker
-
how to package a PyTorch container, extending the SageMaker PyTorch container by extending the SageMaker PyTorch container we can utilize the existing training and hosting solution made to work on SageMaker.
-
AWS DLC (Deep Learning Containers), choose one of these as base https://github.com/aws/deep-learning-containers/blob/master/available_images.md
e.g. 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.12.1-cpu-py38-ubuntu20.04-sagemaker
- Spin up a EC2 instance
- Create a requirements.txt with required packages and versions
- Create a dockerfile file with below contents
#Take base AWS DLC container
FROM 763104351884.dkr.ecr.ap-southeast-2.amazonaws.com/pytorch-training:1.12.1-cpu-py38-ubuntu20.04-sagemaker
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt \
&& rm -rf /root/.cache/pip
-
Create a repo in ECR, copy build commands
-
From EC2 terminal LOGIN : ()
aws ecr get-login-password --region ap-southeast-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.ap-southeast-2.amazonaws.com
-
Then Build DOCKER
docker build -t custom-pytorch-cpu-sagemaker .
-
After the build is completed, tag your image so you can push the image to this repository:
docker tag custom-pytorch-cpu-sagemaker:latest 536176424191.dkr.ecr.ap-southeast-2.amazonaws.com/custom-pytorch-cpu-sagemaker:latest
-
Then again login to
aws ecr get-login-password --region ap-southeast-2 | docker login --username AWS --password-stdin 536176424191.dkr.ecr.ap-southeast-2.amazonaws.com
-
Run the following command to push this image to your newly created AWS repository:
docker push 536176424191.dkr.ecr.ap-southeast-2.amazonaws.com/custom-pytorch-cpu-sagemaker:latest
Refer Building AWS Deep Learning Containers Custom Images
-
Go to terminal and check for tensorboard, if not present then install
pip install tensorboard
-
Run
tensorboard --logdir <s3 URI of logs>
e.g.
tensorboard --logdir s3://sagemaker-ap-southeast-2-536176424191/sagemaker-intel-classification-logs/training-intel-dataset-2022-12-15-15-44-51-564/tensorboard-output/
- Upload to TensorBoard.dev
- Uploading the TensorBoard logs will give you a URL that can be shared with anyone.
- Uploaded TensorBoards are public, so do not upload sensitive data.
- The uploader will exit when the entire logdir has uploaded. (This is what the --one_shot flag specifies.)
tensorboard dev upload --logdir s3://sagemaker-ap-southeast-2-536176424191/sagemaker-intel-classification-logs/training-intel-dataset-2022-12-15-15-44-51-564/tensorboard-output/ \
--name "Intel image Classification" \
--description "git link here" \
--one_shot
https://tensorboard.dev/experiment/Ry8FY2MvTr2CaM15k2ikwA/#scalars
-
https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html
-
AWS DLC (Deep Learning Containers), choose one of these as base https://github.com/aws/deep-learning-containers/blob/master/available_images.md
-
Custome Docker Image https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-custom-images.html
-
Setup for HTTPS users using Git credentials https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-gc.html?icmpid=docs_acc_console_connect_np