snap-stanford/UCE

is there training code available?

szalata opened this issue · 4 comments

I couldn't find a script used for training the model. Thanks!

Yanay1 commented

We're not planning on releasing a training script as of now-- we don't necessarily want people to finetune the model on individual datasets since we intend for embeddings to remain universally shareable (zero-shot). Our implementation of training is also very specific to our individual hardware setup.

@Yanay1 I'd like to stress that science needs to be reproducible or any hypothesis cannot be confirmed. Training scripts are one small component of that.

yhr91 commented

Technically, all the code needed from the model side to reproduce the results in the paper is available in this repository. For instance, you can take a look at some analysis scripts shared here which only make use of embeddings produced from a pre-trained model.

The challenge you would face at the moment is not the code, but that most of our results were produced using the Tabula Sapiens v2 dataset which so far is not published.

By reproducibility, I have in mind reaching the results you have from just code. If we start with downloaded weights, we are forced to rely on the training description and we cannot try it or evaluate after training on another dataset