aws/sagemaker-tensorflow-training-toolkit
Toolkit for running TensorFlow training scripts on SageMaker. Dockerfiles used for building SageMaker TensorFlow Containers are at https://github.com/aws/deep-learning-containers.
PythonApache-2.0
Issues
- 8
RuntimeError: Failed to run: ['docker-compose', '-f', '/tmp/tmp93Sn5U/docker-compose.yaml', 'up', '--build', '--abort-on-container-exit'], Process exited with code: 127
#263 opened by starrylive - 14
Support different tf.distribute.Strategies for distributed training on SageMaker
#391 opened by anirudhacharya - 5
Model deployment is failing with the error "The primary container for production variant AllTraffic did not pass the ping health check.
#401 opened by vishwath96 - 5
How to get evaluation metrics in output logs
#392 opened by MelissaKR - 4
- 3
Multiple inputs failing with different types
#188 opened by mhwilder - 4
[Minor] script-mode branch lacks Dockerfile patch to deal with S3 response timeout configurability
#153 opened by zmjjmz - 6
Support Distributed Training Strategies
#62 opened by andrewortman - 4
pytest test/integration error
#379 opened by ChaiBapchya - 2
Restore from checkpoints Tensorflow eager execution
#344 opened by jenishah - 4
pytest unit test error
#271 opened by xzy0223 - 2
Custom CUDA Operations
#260 opened by JKurzer - 6
Where are the container images for script mode ?
#202 opened by keerath - 5
- 2
- 4
- 1
- 1
- 2
Incorrect usage: pytest tests/functional
#378 opened by ChaiBapchya - 5
- 1
Parameter Server entrypoint
#386 opened by ChaiBapchya - 4
Submodule Error
#345 opened by larsll - 0
Cannot find EIA images
#294 opened by ruodingt - 8
Support for bleeding edge versions of Tensorflow
#246 opened by durandg12 - 7
Support for tensorflow 1.14?
#226 opened by panfeng-hover - 2
botocore version error
#244 opened by Elizaaaaa - 5
- 8
- 2
Looking for TF 1.14 ScriptMode Py3 GPU image
#240 opened by ragavvenkatesan - 2
Script mode with code build?
#225 opened by leolorenzoluis - 2
Unable to save assets
#72 opened by zjost - 1
- 0
Is there any documentation to build the SageMaker Elastic Inference TensorFlow serving container?
#205 opened - 1
1.9.0 docker build fails
#162 opened by sermolin - 11
missing value for `framework_support_installable`?
#182 opened by domino14 - 2
Add python-dev to the image
#172 opened by mindlace - 6
Error in tensorflor_model_server
#154 opened by juanilarregui - 1
[Question] TF 1.12 timeline
#136 opened by zmjjmz - 2
Where can the official images be found
#123 opened by devTechi - 1
Error saving the model artifact.
#115 opened by PedroCardoso - 2
- 1
- 3
Add Instructions for Contributing to the project
#85 opened by tlelson - 1
Getting error using the image from Java
#84 opened by PedroCardoso - 1
README not correct for tensorflow >= 1.9.0
#73 opened by zjost - 8
- 1
- 3
Error on entry.py
#60 opened by PedroCardoso - 3
- 2