logging experiments without a decorator and without returning model file

Question

logging experiments without a decorator and without returning model file

fcakyon opened this issue 2 years ago · 6 comments

Summary

While logging anything to layer.ai, a dataset or model (consider only these two decorators for simplicity) decorator is needed to wrap a function. However, I need a way of logging parameters/metrics more dynamically.

For instance, considering a LayerTracker class as below (modified version of huggingface/accelerate wandb tracker), user cannot change the model decorator argument name dynamically as it is hardcoded:

from accelerate.tracking import GeneralTracker

class LayerTracker(GeneralTracker):
    """
    A `Tracker` class that supports `layer`. Should be initialized at the start of your script.
    Args:
        run_name (`str`):
            The name of the experiment run.
        kwargs:
            Additional key word arguments passed along to the `layer.init` method.
    """

    name = "layer"
    requires_logging_directory = False

    def __init__(self, run_name: str, **kwargs):
        self.run_name = run_name

        access_token = kwargs.get("access_token", None)
        if access_token is not None:
            layer.login_with_access_token(access_token)
        self.project: layer.contracts.projects.Project = layer.init(project_name=self.run_name, **kwargs)
        logger.info(f"Initialized Layer project {self.run_name}")
        logger.info("Make sure to log any initial configurations with `self.store_init_configuration` before training!")

    def store_init_configuration(self, values: dict):
        """
        Logs `values` as hyperparameters for the run. Should be run at the beginning of your experiment.
        Args:
            values (Dictionary `str` to `bool`, `str`, `float` or `int`):
                Values to be stored as initial hyperparameters as key-value pairs. The values need to have type `bool`,
                `str`, `float`, `int`, or `None`.
        """
        layer.log({"config": values})
        logger.info("Stored initial configuration hyperparameters to Layer")

    @model("model")
    def log(self, values: dict, step: Optional[int] = None, **kwargs):
        """
        Logs `values` to the current run.
        Args:
            values (Dictionary `str` to `str`, `float`, `int` or `dict` of `str` to `float`/`int`):
                Values to be logged as key-value pairs. The values need to have type `str`, `float`, `int` or `dict` of
                `str` to `float`/`int`.
            step (`int`, *optional*):
                The run step. If included, the log will be affiliated with this step.
            kwargs:
                Additional key word arguments passed along to the `layer.log` method.
        """
        layer.log(values, step=step, **kwargs)
        logger.info("Successfully logged to Layer")

To log anything into layer.ai, this decorated function have to return a model file (nn.Module for Pytorch). However, in all present frameworks logger functions/classes cannot return a model file, check most popular pytorch training frameworks: hf/accelerate logger classes, mmdet/mmcv logger classes, pytorch lightning logger classes

Motivation

It would be easier to add layer.ai logger support to other frameworks as mdmetection/lightningai/huggingface-accelerator if there would alternative layer.ai logging API.

Additional context

If I am not clear enough or I am missing anything, feel free to inform me.

Answer 1 · 2022-07-25T08:44:56.000Z

Hi @fcakyon,

Layer is built to tightly couple the models and it's metadata (parameters, metrics etc.). That's why we expect a model decorated function returning an ML model. So here you can do:

Remove the @model decorater from log function above.
Initiate the accelerator train from a model decorated function like below:

@model("my_model")
def train():
  accelerator = Accelerator()
  device = accelerator.device

  model = torch.nn.Transformer().to(device)
  model.train()
  return model

layer.run([train])

Answer 2 · 2022-07-25T09:46:28.000Z

Thanks for your answer @mecevit. I understand your example but I am not sure if this kind of implementation is possible in my case.

In my use case, users don't necessarily have layer.ai installed. It is an optional dependency. Users may prefer neptune.ai or other loggers.

So I am checking which trackers are installed in an environment, and enabling related logger classes based on the availability. (mmdet/torch lightning/huggingface also implement trackers like this)

With your given code snippet, I have to hard-code layer decorator in my main script which makes it a core dependency.

Another issue is that in my implementation there is no such train function that returns a trained model. My Trainer class is implemented similarly to Keras/TorchLightning. There is a private training loop that is hidden from the user and training/eval steps and metric/loss calculations as separate class definitions.

instead of this:

def train(config):
    # perform training
    return model

I have this:

class Trainer:
    def __init__(self, config):
    # init accelerator, model and dataloaders, log config

    def _training_loop(self):
        # private training loop

    def _validation_loop(self):
        # private val loop

    def training_step(self, batch):
        # training step to be implemented:
        # perform forward pass
        # calculate/log loss
        # calculate/log metrics

    def validation_step(self, batch):
        # validation step to be implemented:
        # perform forward pass
        # calculate/log loss
        # calculate/log metrics

    def fit(self):
        # start training

So the metrics/parameters will be logged to layer.ai inside __init__, training_step, validation_step functions but none of these functions can return any Pytorch model object.

Answer 3 · 2022-08-29T08:43:37.000Z

Hi @fcakyon , thanks so much for the feedback on this one.

With #308 merged, we can now log to Layer programmatically as you probably already know.

In my use case, users don't necessarily have layer.ai installed. It is an optional dependency. Users may prefer neptune.ai or other loggers.
Is it now easier to do this?

Example:
A simple xor model experiment with an optional LayerCallback

import layer
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense
from keras.callbacks import EarlyStopping

from tensorflow.keras.callbacks import Callback

class LayerCallback(Callback):

    def on_epoch_end(self, epoch, logs=None): 
      layer.log({"Training accuracy over epoch": logs["binary_accuracy"],"Training loss over epoch": logs["loss"]}, epoch)


def train(with_callback_logging):
  layer.log({'with_callback_logging': with_callback_logging})

  # the four different states of the XOR gate
  training_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], "float32")

  # the four expected results in the same order
  target_data = np.array([[0], [1], [1], [0]], "float32")

  model = Sequential()
  model.add(Dense(16, input_dim=2, activation='relu'))
  model.add(Dense(1, activation='sigmoid'))

  model.compile(loss='mean_squared_error',
                optimizer='adam',
                metrics=['binary_accuracy'])
  
  callbacks = [LayerCallback()] if with_callback_logging else []

  model.fit(training_data, target_data, 
            epochs=200, 
            verbose=2, 
            callbacks=callbacks,
            )
  
  test_loss, test_acc = model.evaluate(training_data, target_data)
  test_metrics = {"Test loss": test_loss,"Test accuracy": test_acc }
  layer.log(test_metrics)

and then as you know we can do:

layer.model("xor")(train)(True)

Which results in:

You also mentioned wanting to use a Trainer class, is it now easier?

Answer 4 · 2022-08-29T09:35:19.000Z

Yes it is easier now, thank you!

Answer 5 · 2022-08-29T09:37:30.000Z

Great! By the way @fcakyon do you use callbacks when training? Have you noticed performance issues when using Layer logs that you would like us to address?

Answer 6 · 2022-08-30T12:16:16.000Z

I will try to inspect more and see if my previous performance issues are gone @adrien-layer 👍🏻