Recover previous glue model

Question

Recover previous glue model

amachadolopez opened this issue a year ago · 8 comments

Hi,

First of all, thanks for the amazing tool.

I would like to know if it is possible to recover a previously trained SCGLUE model (i.e. from the checkpoint.pt files that I have in tmp), since yesterday I generated a model which was interesting and that I actually spent quite a long time understanding and even labeling cell populations and all that, but I am stupid and I did not save it. Now I would like to compare this model with some new models I have been generating today (this time setting a specific seed so that this does not happen again), but I do not know how to recover it.

Thank you very much,
Alba

Answer 1 · 2023-07-11T03:17:35.000Z

Hi Alba,

Thanks for your interest in GLUE! As long as you can construct a model with the same architecture (same hyperparameters), you can restore the weights from a checkpoint file by the followings:

glue = scglue.models.fit_SCGLUE(...)  # Setting fit_kws={"max_epochs": 0} can skip training
loaded = torch.load("checkpoint.pt")
glue.net.load_state_dict(loaded["net"])

Let me know if there were further issues!

Answer 2 · 2023-07-11T07:34:39.000Z

Hi!

Thank you very much for your answer. Apologies if my questions are a bit silly I am new to pretty much everything (python, GLUE, models...), so I have a couple follow-up questions:

Just to be sure, hyperparameters are manually set when creating the model, right? So if when I constructed the first model I did it with default arguments, using the same code should produce a model with the same architecture, is this correct? If hyperparameters change dynamically with each model built, how could I retrieve them?
I tried to use your code, but I get the error "AttributeError: 'SCGLUEModel' object has no attribute 'load_state_dict'" with this code:

glue = scglue.models.fit_SCGLUE(
    {"rna": rna, "prot": adata}, 
    graph, fit_kws={"max_epochs": 0}
)

loaded = torch.load("checkpoint_31.pt")
glue.load_state_dict(loaded["net"])

I also tried with a GLUE model that I stored from other attempts, but I get the same error. My scGlue version is 0.3.2 (updated I think) but just in case that may be the issue.

Again, apologies and thank you so much for your help :)
Best,
Alba

Answer 3 · 2023-07-11T07:45:03.000Z

Yes the hyperparameters do not change dynamically, as long as the data and keyword arguments remain the same, the constructed model is the same.
My bad. It should be glue.net.load_state_dict(loaded["net"]). Can you try if this works?

Answer 4 · 2023-07-11T08:15:30.000Z

Hi!

Yes, it worked! Sadly the model is not the same I had, I guess the checkpoint is not recent enough to have reached the state that I was happy with 😞 As a last chance, is there any way I could find out which seed was used so that I can rebuild the model from scratch?

Thank you again
Alba

Answer 5 · 2023-07-11T08:27:44.000Z

The random seed is 0 by default, unless you changed it manually, in which case there is no way to restore the seed unfortunately. Another thing worth noting is that even when the seed is the same, model training is not 100% reproducible on the GPU, because of some operations like scatter_add being non-deterministic on the GPU.

Typically the model should be robust to random seed and GPU non-determinism. Did you observe a drastic difference between seeds?

Answer 6 · 2023-07-11T08:49:57.000Z

I have not modified the seed, and yet everytime I run the model I get pretty different results, which is why I was so interested on recovering my original model. These are a few examples:

The first model showed the best integration between protein and RNA, while for the others the overlay between domains seems to be much worse, specially loking at the small cluster that would be "immune cells". Perhaps I am being a bit dramatic here and the differences between models are not so drastic?

Answer 7 · 2023-07-11T09:13:57.000Z

Well that's indeed more different than I would expect. In this case, rather than trying to recover the original model which happened to be good, I'd suggest tweaking gene selection and how the graph was constructed between protein and RNA, so that the model can reliably produce a good alignment like the first one.

Answer 8 · 2023-07-11T13:05:54.000Z

Alright, then I'll definetely try to adjust everything to get a better and more consistent model 😄

I'll close this issue and open a new one if anything else comes up
Again, thank you very very much for your help!