New Data Integration
cadyyuheng opened this issue · 3 comments
Dear Saturn team,
Say we have some mouse in-house datasets that we'd like to integrate with your mammalian cell atlas under the same embedding. Without re-training with our datasets, is there any quick way that we can find the macrogene values for each our cells? How can leverage the genes_to_macrogenes.pkl file of the mammalian cell atlas together with the count matrix of our own data?
You could use the centroids to take a weighted average of expression.
However, I would recommend retraining.
Could you please elaborate on how exactly we can "use the centroids to take a weighted average of expression", in particular the weighted average part? It seems in the manuscript that the macrogene expression values
Yes. Since you are not using these as inputs to a neural network, you can just ignore the ReLU and LayerNorm parts.