New Data Integration

Question

New Data Integration

cadyyuheng opened this issue 9 months ago · 3 comments

Dear Saturn team,

Say we have some mouse in-house datasets that we'd like to integrate with your mammalian cell atlas under the same embedding. Without re-training with our datasets, is there any quick way that we can find the macrogene values for each our cells? How can leverage the genes_to_macrogenes.pkl file of the mammalian cell atlas together with the count matrix of our own data?

Thanks

Answer 1 · 2024-04-03T03:28:48.000Z

You could use the centroids to take a weighted average of expression.

However, I would recommend retraining.

Answer 2 · 2024-04-05T16:50:32.000Z

Could you please elaborate on how exactly we can "use the centroids to take a weighted average of expression", in particular the weighted average part? It seems in the manuscript that the macrogene expression values $e_{c}$ is defined by $e_{c}=ReLU(LayerNorm(W^T_{s}log(X^s_{c}+1)))$. Is this what you mean by "weighted average"?

Thanks!

Answer 3 · 2024-04-05T18:36:32.000Z

Yes. Since you are not using these as inputs to a neural network, you can just ignore the ReLU and LayerNorm parts.