/amazon-sagemaker-entrypoint-utilities

Sample utilities to shorten or simplify Amazon SageMaker's training entrypoint: logging handlers, silenced tqdm, hyperparameter parsings for writing meta entrypoint script, etc.

Primary LanguagePythonMIT No AttributionMIT-0

Utilities for Amazon SageMaker Training Entrypoint

Table of contents:

1. Overview

This repo hosts a library and examples for writing SageMaker's meta training entrypoint scripts. The library also contains additional utilities to streamline boiler-plate codes in those scripts.

The acronym smepu stands for SageMaker entry point utilities.

Main features:

  1. Support the writing of meta entrypoint scripts for SageMaker training jobs. To achieve this, smepu automatically deserializes hyperparameters from CLI-args to Python datatypes, then passing-through the deserialized CLI-args to a wrapped estimator.

    Hence, entrypoint authors do not have to write the boiler-plate codes that "parses those 10+ CLI args, and calls another estimator with those args. Then, rinse-and-repeat in 5 more scripts for 5 more different ML algorithms."

    Please go to examples/ and look at the various README.md files to more details. For a more complete, sophisticated example, please also look at this AWS blog post and its companion gluonts example.

    Implementation note: this is made possible thanks to the gluonts.core.serde.decode() function.

  2. Configure logger to consistently send logs to Amazon CloudWatch log streams.

  3. Automatically disable fancy outputs when running as Amazon SageMaker training jobs.

    • Silence tqdm when training on Amazon SageMaker, to reduce the noise of your Amazon CloudWatch logs.

    • Plain output (i.e., no color, no fancy) for wasabi, and spacy CLI (e.g., train or convert).

With proper care, the meta entrypoint script can run on either a SageMaker container (either as training jobs or in SageMaker local mode), or on your own Python (virtual) environment.

2. Installation

pip install \
    'git+https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities@main#egg=smepu'

or:

git clone \
    https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities.git

cd amazon-sagemaker-entrypoint-utilities
pip install -e .

3. Usage

Pre-requisite: know how to write an Amazon SageMaker training entrypoint.

A working hello-world example is provided under examples/00-hello-world which contains two kinds of meta-scripts:

  1. entrypoint.py uses argparse to parse hyperparameters.
  2. entrypoint-click.py uses click to parse hyperparameters.

Use examples/00-hello-world/entrypoint.sh to quickly observe the behavior of those train entrypoints when they run directly on your Python environment in your machine.

[NOTE: not to be confused with "Amazon SageMaker local mode" which refers to running the script on a SageMaker container running on a SageMaker notebook instance.]

Running the train script directly in your Python environment is a useful trick to speed-up your "dev + functional-test" cycle. Typically this stage utilizes synthetic tiny dataset, and you heavily leverage your favorite dev tools (i.e., unit-test frameworks, code debuggers, etc.).

After this, you can perform a "compatibility test" by running your train script on a Amazon SageMaker training container (whether on "Amazon SageMaker local mode" or a training instance), to iron-out compatibilities issues.

When your scripts have been fully tested, then you can start your actual, large-scale model training & experimentation on Amazon SageMaker training instances.

Sample runs:

# Run entrypoint script outside of SageMaker.
examples/00-hello-world/entrypoint.sh

# Mimic running on Amazon SageMaker: automatically off tqdm.
SM_HOSTS=abcd examples/00-hello-world/entrypoint.sh

# Run click-version of entrypoint
examples/00-hello-world/entrypoint.sh -click

To experiment with different hyperparameters, see DummyEstimator in examples/00-hello-world/dummyest.py, and if necessary modify accordingly complex_args in examples/00-hello-world/entrypoint.sh.

Feel free to further explore other sample scripts under examples/.

4. Security

See CONTRIBUTING for more formation.

5. License

This library is licensed under the MIT-0 License. See the LICENSE file.