Table of contents:
This repo hosts a library and examples for writing SageMaker's meta training entrypoint scripts. The library also contains additional utilities to streamline boiler-plate codes in those scripts.
The acronym smepu stands for SageMaker entry point
utilities.
Main features:
-
Support the writing of meta entrypoint scripts for SageMaker training jobs. To achieve this,
smepuautomatically deserializes hyperparameters from CLI-args to Python datatypes, then passing-through the deserialized CLI-args to a wrapped estimator.Hence, entrypoint authors do not have to write the boiler-plate codes that "parses those 10+ CLI args, and calls another estimator with those args. Then, rinse-and-repeat in 5 more scripts for 5 more different ML algorithms."
Please go to
examples/and look at the variousREADME.mdfiles to more details.Implementation note: this is made possible thanks to the
gluonts.core.serde.decode()function. -
Configure logger to consistently send logs to Amazon CloudWatch log streams.
-
Automatically disable fancy outputs when running as Amazon SageMaker training jobs.
With proper care, the meta entrypoint script can run on either a SageMaker container (either as training jobs or in SageMaker local mode), or on your own Python (virtual) environment.
pip install \
'git+https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities@main#egg=smepu'or:
git clone \
https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities.git
cd amazon-sagemaker-entrypoint-utilities
pip install -e .Pre-requisite: know how to write an Amazon SageMaker training entrypoint.
A working hello-world example is provided under examples/00-hello-world which
contains two kinds of meta-scripts:
entrypoint.pyusesargparseto parse hyperparameters.entrypoint-click.pyusesclickto parse hyperparameters.
Use examples/00-hello-world/entrypoint.sh to quickly observe the behavior of
those train entrypoints when they run directly on your Python environment in
your machine.
[NOTE: not to be confused with "Amazon SageMaker local mode" which refers to running the script on a SageMaker container running on a SageMaker notebook instance.]
Running the train script directly in your Python environment is a useful trick to speed-up your "dev + functional-test" cycle. Typically this stage utilizes synthetic tiny dataset, and you heavily leverage your favorite dev tools (i.e., unit-test frameworks, code debuggers, etc.).
After this, you can perform a "compatibility test" by running your train script on a Amazon SageMaker training container (whether on "Amazon SageMaker local mode" or a training instance), to iron-out compatibilities issues.
When your scripts have been fully tested, then you can start your actual, large-scale model training & experimentation on Amazon SageMaker training instances.
Sample runs:
# Run entrypoint script outside of SageMaker.
examples/00-hello-world/entrypoint.sh
# Mimic running on Amazon SageMaker: automatically off tqdm.
SM_HOSTS=abcd examples/00-hello-world/entrypoint.sh
# Run click-version of entrypoint
examples/00-hello-world/entrypoint.sh -clickTo experiment with different hyperparameters, see DummyEstimator in
examples/00-hello-world/dummyest.py, and if necessary modify accordingly
complex_args in examples/00-hello-world/entrypoint.sh.
Feel free to further explore other sample scripts under examples/.
See CONTRIBUTING for more formation.
This library is licensed under the MIT-0 License. See the LICENSE file.