Any way to log the step artifacts ?

Question

Any way to log the step artifacts ?

gajagajago opened this issue 2 years ago · 2 comments

Hello. I am using DeepFM implementation and trying to log the batch time after each step.
I want to do something like below, and get how much time took to process each batch.

# Should log batch time here
    batchtime_log_callback = LambdaCallback(
        on_batch_begin=lambda batch, logs: batchtime_log.write(str(batch)),
        on_batch_end=lambda batch, logs: batchtime_log.write(str(batch)))
    
    model.fit(
        train_model_input, 
        train[target].values,
        callbacks=[batchtime_log_callback],
        batch_size=batch_size, 
        epochs=num_epoch, 
        verbose=2, 
        validation_split=val_ratio)

The desired output print would be like below, but it is okay if other artifacts are printed together. I can post-process. Any method?

xxx ms
yyy ms 
.
.
.

Answer 1 · 2022-07-15T15:10:25.000Z

modify the basemodel.py like this:

result:

set verbose=1 in model.fit(), then you can calculate the time of each epoch from the tqdm log. (each iteration is a batch of data)

Answer 2 · 2022-07-16T05:48:33.000Z

First of all, thanks for the reply. Just to add one thing, I think we should add this line torch.cuda.synchronize() before calling time.time() when distributed training is enabled. This way we can assure that all streams in each CUDA devices has totally finished before logging the time. Thanks for the reply once again!

Ref: https://pytorch.org/docs/stable/generated/torch.cuda.synchronize.html