google-research/timesfm

Error reports duringload_from_checkpoint

Opened this issue · 3 comments

First I call the model loading from checkpoint

model.load_from_checkpoint(repo_id="google/timesfm-1.0-200m")

While loading, it reports the following error

ERROR:absl:For checkpoint version > 1.0, we require users to provide
          `train_state_unpadded_shape_dtype_struct` during checkpoint
          saving/restoring, to avoid potential silent bugs when loading
          checkpoints to incompatible unpadded shapes of TrainState.

me too.

This error is not blocking. Can you wait and see if the jitting succeeds?

2024-09-11 14:50:46.515527: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Constructing model weights.
Constructed model weights in 2.43 seconds.
Restoring checkpoint from /disk01/timesfm_repo/timesfm-1.0-200m/checkpoints.
WARNING:absl:No registered CheckpointArgs found for handler type: <class 'paxml.checkpoints.FlaxCheckpointHandler'>
WARNING:absl:Configured `CheckpointManager` using deprecated legacy API. Please follow the instructions at https://orbax.readthedocs.io/en/latest/api_refactor.html to migrate by May 1st, 2024.
WARNING:absl:train_state_unpadded_shape_dtype_struct is not provided. We assume `train_state` is unpadded.
ERROR:absl:For checkpoint version > 1.0, we require users to provide
          `train_state_unpadded_shape_dtype_struct` during checkpoint
          saving/restoring, to avoid potential silent bugs when loading
          checkpoints to incompatible unpadded shapes of TrainState.
Restored checkpoint in 3.91 seconds.
Jitting decoding.
Jitted decoding in 21.69 seconds.

how to fix the ERROR message train_state_unpadded_shape_dtype_struct ?
by the way, could you provide pytorch release version checkpoints.