Error reports duringload_from_checkpoint
Opened this issue · 3 comments
JackeyLee007 commented
First I call the model loading from checkpoint
model.load_from_checkpoint(repo_id="google/timesfm-1.0-200m")
While loading, it reports the following error
ERROR:absl:For checkpoint version > 1.0, we require users to provide
`train_state_unpadded_shape_dtype_struct` during checkpoint
saving/restoring, to avoid potential silent bugs when loading
checkpoints to incompatible unpadded shapes of TrainState.
godcrying commented
me too.
siriuz42 commented
This error is not blocking. Can you wait and see if the jitting succeeds?
guiyang882 commented
2024-09-11 14:50:46.515527: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Constructing model weights.
Constructed model weights in 2.43 seconds.
Restoring checkpoint from /disk01/timesfm_repo/timesfm-1.0-200m/checkpoints.
WARNING:absl:No registered CheckpointArgs found for handler type: <class 'paxml.checkpoints.FlaxCheckpointHandler'>
WARNING:absl:Configured `CheckpointManager` using deprecated legacy API. Please follow the instructions at https://orbax.readthedocs.io/en/latest/api_refactor.html to migrate by May 1st, 2024.
WARNING:absl:train_state_unpadded_shape_dtype_struct is not provided. We assume `train_state` is unpadded.
ERROR:absl:For checkpoint version > 1.0, we require users to provide
`train_state_unpadded_shape_dtype_struct` during checkpoint
saving/restoring, to avoid potential silent bugs when loading
checkpoints to incompatible unpadded shapes of TrainState.
Restored checkpoint in 3.91 seconds.
Jitting decoding.
Jitted decoding in 21.69 seconds.
how to fix the ERROR message train_state_unpadded_shape_dtype_struct
?
by the way, could you provide pytorch release version checkpoints.