Table of contents:
This repo hosts a library and examples for writing SageMaker's meta training entrypoint scripts. The library also contains additional utilities to streamline boiler-plate codes in those scripts.
The acronym smepu
stands for SageMaker entry point
utilities.
Main features:
-
Support the writing of meta entrypoint scripts for SageMaker training jobs. To achieve this,
smepu
automatically deserializes hyperparameters from CLI-args to Python datatypes, then passing-through the deserialized CLI-args to a wrapped estimator.Hence, entrypoint authors do not have to write the boiler-plate codes that "parses those 10+ CLI args, and calls another estimator with those args. Then, rinse-and-repeat in 5 more scripts for 5 more different ML algorithms."
Please go to
examples/
and look at the variousREADME.md
files to more details. For a more complete, sophisticated example, please also look at this AWS blog post and its companion gluonts example.Implementation note: this is made possible thanks to the
gluonts.core.serde.decode()
function. -
Configure logger to consistently send logs to Amazon CloudWatch log streams.
-
Automatically disable fancy outputs when running as Amazon SageMaker training jobs.
With proper care, the meta entrypoint script can run on either a SageMaker container (either as training jobs or in SageMaker local mode), or on your own Python (virtual) environment.
pip install \
'git+https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities@main#egg=smepu'
or:
git clone \
https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities.git
cd amazon-sagemaker-entrypoint-utilities
pip install -e .
Pre-requisite: know how to write an Amazon SageMaker training entrypoint.
A working hello-world example is provided under examples/00-hello-world
which
contains two kinds of meta-scripts:
entrypoint.py
usesargparse
to parse hyperparameters.entrypoint-click.py
usesclick
to parse hyperparameters.
Use examples/00-hello-world/entrypoint.sh
to quickly observe the behavior of
those train entrypoints when they run directly on your Python environment in
your machine.
[NOTE: not to be confused with "Amazon SageMaker local mode" which refers to running the script on a SageMaker container running on a SageMaker notebook instance.]
Running the train script directly in your Python environment is a useful trick to speed-up your "dev + functional-test" cycle. Typically this stage utilizes synthetic tiny dataset, and you heavily leverage your favorite dev tools (i.e., unit-test frameworks, code debuggers, etc.).
After this, you can perform a "compatibility test" by running your train script on a Amazon SageMaker training container (whether on "Amazon SageMaker local mode" or a training instance), to iron-out compatibilities issues.
When your scripts have been fully tested, then you can start your actual, large-scale model training & experimentation on Amazon SageMaker training instances.
Sample runs:
# Run entrypoint script outside of SageMaker.
examples/00-hello-world/entrypoint.sh
# Mimic running on Amazon SageMaker: automatically off tqdm.
SM_HOSTS=abcd examples/00-hello-world/entrypoint.sh
# Run click-version of entrypoint
examples/00-hello-world/entrypoint.sh -click
To experiment with different hyperparameters, see DummyEstimator
in
examples/00-hello-world/dummyest.py
, and if necessary modify accordingly
complex_args
in examples/00-hello-world/entrypoint.sh
.
Feel free to further explore other sample scripts under examples/
.
See CONTRIBUTING for more formation.
This library is licensed under the MIT-0 License. See the LICENSE file.