A simple end to end Image Classification (preprocessing, training, deployment) using AWS Sagemaker.
NOTE : Using pipeline is in different repo.
Upload data file in .zip (data source https://www.kaggle.com/datasets/puneet6060/intel-image-classification) to S3.
Create a repo in AWS CodeCommit for DVC
notebook -
notebook -
notebook -
notebook -
Save JSON file in
# save eval metrics in json file
trainer.test(model, datamodule)
eval_metrics = { k : v.tolist() for k, v in trainer.callback_metrics.items() }
with open((sm_model_dir/"eval_metrics.json"), "w") as jfile:
json.dump(eval_metrics, jfile)
- Check Cloud Watch logs
Top 5 predictions for each image.
Create Custom Docker Image
for Sagemaker Jobs using AWS Deep Learning Containers as base.
- Install additional dependencies. (E.g. I want to install a specific Python library, that the current SageMaker containers don't install.)
- Configure your environment. (E.g. I want to add an environment variable to my container.)
and AmazonEC2ContainerRegistryFullAccess
to create the custom docker
how to package a PyTorch container, extending the SageMaker PyTorch container by extending the SageMaker PyTorch container we can utilize the existing training and hosting solution made to work on SageMaker.
AWS DLC (Deep Learning Containers), choose one of these as base https://github.com/aws/deep-learning-containers/blob/master/available_images.md
e.g. 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.12.1-cpu-py38-ubuntu20.04-sagemaker
- Spin up a EC2 instance
- Create a requirements.txt with required packages and versions
- Create a dockerfile file with below contents
#Take base AWS DLC container
FROM 763104351884.dkr.ecr.ap-southeast-2.amazonaws.com/pytorch-training:1.12.1-cpu-py38-ubuntu20.04-sagemaker
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt \
&& rm -rf /root/.cache/pip
Create a repo in ECR, copy build commands
From EC2 terminal LOGIN : ()
aws ecr get-login-password --region ap-southeast-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.ap-southeast-2.amazonaws.com
Then Build DOCKER
docker build -t custom-pytorch-cpu-sagemaker .
After the build is completed, tag your image so you can push the image to this repository:
docker tag custom-pytorch-cpu-sagemaker:latest 536176424191.dkr.ecr.ap-southeast-2.amazonaws.com/custom-pytorch-cpu-sagemaker:latest
Then again login to
aws ecr get-login-password --region ap-southeast-2 | docker login --username AWS --password-stdin 536176424191.dkr.ecr.ap-southeast-2.amazonaws.com
Run the following command to push this image to your newly created AWS repository:
docker push 536176424191.dkr.ecr.ap-southeast-2.amazonaws.com/custom-pytorch-cpu-sagemaker:latest
Refer Building AWS Deep Learning Containers Custom Images
Go to terminal and check for tensorboard, if not present then install
pip install tensorboard
tensorboard --logdir <s3 URI of logs>
tensorboard --logdir s3://sagemaker-ap-southeast-2-536176424191/sagemaker-intel-classification-logs/training-intel-dataset-2022-12-15-15-44-51-564/tensorboard-output/
- Upload to TensorBoard.dev
- Uploading the TensorBoard logs will give you a URL that can be shared with anyone.
- Uploaded TensorBoards are public, so do not upload sensitive data.
- The uploader will exit when the entire logdir has uploaded. (This is what the --one_shot flag specifies.)
tensorboard dev upload --logdir s3://sagemaker-ap-southeast-2-536176424191/sagemaker-intel-classification-logs/training-intel-dataset-2022-12-15-15-44-51-564/tensorboard-output/ \
--name "Intel image Classification" \
--description "git link here" \
AWS DLC (Deep Learning Containers), choose one of these as base https://github.com/aws/deep-learning-containers/blob/master/available_images.md
Custome Docker Image https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-custom-images.html
Setup for HTTPS users using Git credentials https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-gc.html?icmpid=docs_acc_console_connect_np