mhamilton723/STEGO

Model checkpointing only saving epoch 0

Holmes2002 opened this issue · 3 comments

Why I run model I just receive epoch 0 ?? Where the orthers

I have the following checkpointing logic set up::

 ModelCheckpoint(
                dirpath=join(checkpoint_dir, name),
                every_n_train_steps=400,
                save_top_k=2,
                monitor="test/cluster/mIoU",
                mode="max",
            )

Perhaps your dataset is so large you didnt leave epoch 0 yet.

I solved it. Tks for your cmt !