aws/sagemaker-training-toolkit
Train machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
PythonApache-2.0
Issues
- 0
Build failure on MacOS
#225 opened by DRKolev-code - 4
Unable to install sagemaker-training on Windows
#110 opened by martinlyra - 1
- 9
- 0
- 0
SageMaker training toolkit reorders hyperparameters
#221 opened by vsimkus - 0
- 0
Extend documentation regarding distributed training for own Docker containers.
#218 opened by marseller - 9
No support for Python 3.10
#129 opened by peter-wimsey - 6
Failed to parse string hyperparameter
#66 opened by uwaisiqbal - 1
- 0
Get region with ENV var
#207 opened by austinmw - 0
Invalid dash-separated options for description-file
#206 opened by wickeat - 0
- 0
ModuleNotFoundError: Sagemaker only copies entry_point file to /opt/ml/code/ instead of the holy-cloned source code
#200 opened by celsofranssa - 0
P5 instance support
#191 opened by haozhx23 - 0
- 0
- 0
Deepspeed Launcher
#184 opened by anupam-dewan - 8
- 0
Publish wheels to PyPI
#174 opened by hajapy - 0
Passing SIGTERM to entrypoint to be able to handle SPOT failures gracefully in user-code
#173 opened by croth1 - 0
- 1
Python 3.6 unsupported [bug/question]
#161 opened by adamwrobel-ext-gd - 0
Mpi mode sets all nodes to the same SM_CURRENT_HOST
#158 opened by verdimrc - 3
Custom MPI options doesn't override the flags
#70 opened by ChaiBapchya - 2
- 0
Hyperparameters not shell escaped
#128 opened by bstriner - 0
Pass SIGTERM to training script to stop training
#125 opened by bstriner - 3
- 2
Hyperparameters and other cmd arguments are not passed to shell entrypoint in tensorflow > 2.4
#115 opened by unoebauer - 3
- 5
- 1
- 1
- 1
Add Custom_Overrides flag.
#113 opened by mathephysicist - 0
Custom_Overrides
#112 opened by mathephysicist - 2
- 0
- 0
- 0
Add gcc package requirment
#94 opened by timorkal - 0
- 0
SageMaker Endpoint stuck at “Creating”
#92 opened by vas610 - 0
Sagemaker Fails to download code from S3
#85 opened by uwaisiqbal - 0
Which aws service is the most suitable/used to launch a scheduled training job?
#84 opened by david-fortini - 0
Enable functional test for mpi
#83 opened by ChaiBapchya - 0
Enhance UX for training
#77 opened by ehsanmok - 1
Bash script mode support across all estimators
#75 opened by ehsanmok - 3
Unable to run a Tensorflow Estimator.
#72 opened by nectario - 3