Use AWS Sagemaker to train a pretrained model that can perform image classification by using the Sagemaker profiling, debugger, hyperparameter tuning and other good ML engineering practices. This can be done on either the provided dog breed classication data set or one of your choice.
Enter AWS through the gateway in the course and open SageMaker Studio. Download the starter files. Download/Make the dataset available.
The provided dataset is the dogbreed classification dataset which can be found in the classroom. The project is designed to be dataset independent so if there is a dataset that is more interesting or relevant to your work, you are welcome to use it to complete the project.
For this project I used the provided dog breed dataset.
Upload the data to an S3 bucket through the AWS Gateway so that SageMaker has access to the data.
What kind of model did you choose for this experiment and why? Give an overview of the types of parameters and their ranges used for the hyperparameter search
I used pretrained resnet18 model for this project. Pretrained models saves time. I chose resnet-18 specifically because it is a popular convolutional neural network architecture that was originally trained on the ImageNet dataset. The ImageNet dataset is a large-scale visual recognition challenge, which contains over 14 million labeled images belonging to 1,000 different classes. The other advantage of resnet is that it is designed to overcome the one common drawback of deep networks which is vanishing gradient.
i have used three parameters for hypertuning -
- Learning rate
- batch size
- epochs
Here are there ranges, hyperparameter_ranges = { "lr": ContinuousParameter(0.001, 0.1), "batch-size": CategoricalParameter([16,32]), "epochs": IntegerParameter(2,4) }
Remember that your README should:
-
Tune at least two hyperparameters
-
Retrieve the best best hyperparameters from all your training jobs
{'_tuning_objective_metric': '"average test loss"', 'batch-size': '"32"', 'epochs': '4', 'lr': '0.03305079691259119', 'sagemaker_container_log_level': '20', 'sagemaker_estimator_class_name': '"PyTorch"', 'sagemaker_estimator_module': '"sagemaker.pytorch.estimator"', 'sagemaker_job_name': '"pytorch-training-2023-03-15-13-37-17-692"', 'sagemaker_program': '"hpo.py"', 'sagemaker_region': '"us-east-1"', 'sagemaker_submit_directory': '"s3://sagemaker-us-east-1-706723451900/pytorch-training-2023-03-15-13-37-17-692/source/sourcedir.tar.gz"'}
TODO: Give an overview of how you performed model debugging and profiling in Sagemaker
To perform debugging you need to import smdebug library. Then creat a hook. Add hook in training and test function at appropriate (begining) places.
TODO: What are the results/insights did you get by profiling/debugging your model?
This chart shows train loss has decreased but validation loss has increased. Which is a clear indication of overfitting. Hypertuning needs to be improved. I can give larger range of values for tuning job.
TODO Remember to provide the profiler html/pdf file in your submission.
TODO: Give an overview of the deployed model and instructions on how to query the endpoint with a sample input. The model is deployed on ml.m5.large image. It takes image url as input.
TODO Remember to provide a screenshot of the deployed active endpoint in Sagemaker.
TODO (Optional): This is where you can provide information about any standout suggestions that you have attempted.