snap-stanford/SATURN

New Data Integration

cadyyuheng opened this issue · 3 comments

Dear Saturn team,

Say we have some mouse in-house datasets that we'd like to integrate with your mammalian cell atlas under the same embedding. Without re-training with our datasets, is there any quick way that we can find the macrogene values for each our cells? How can leverage the genes_to_macrogenes.pkl file of the mammalian cell atlas together with the count matrix of our own data?

Thanks

You could use the centroids to take a weighted average of expression.

However, I would recommend retraining.

Could you please elaborate on how exactly we can "use the centroids to take a weighted average of expression", in particular the weighted average part? It seems in the manuscript that the macrogene expression values $e_{c}$ is defined by $e_{c}=ReLU(LayerNorm(W^T_{s}log(X^s_{c}+1)))$. Is this what you mean by "weighted average"?

Thanks!

Yes. Since you are not using these as inputs to a neural network, you can just ignore the ReLU and LayerNorm parts.