atarashansky/SAMap

UMAP looks like a line when neighborhood size was determined by using cell type labels

Evenlyeven opened this issue · 5 comments

Thanks for the useful tool!

I noticed that in my results, some areas look like solid lines (for example the cluster at the top in the screenshot below) in the UMAP. I wonder if this is due to that SAM run was set to neighborhood size determined by using cell type labels provided by myself. Does this look normal to you?
image

And when I check the UMAPs before SAMap stitch them together, they both look "normal" to me.
sam1:
image

sam2:
image

Also, in my test run, where I didn't use cell type lablels to determine neighborhood size, hopping along each cell's outgoing edges was used instead. The UMAP looks more "normal" to me.
image

Any comments or suggestions will be highly appreciated!

The script I used is attached below (paths were replaced by ...):

from samap.mapping import SAMAP
from samap.analysis import (get_mapping_scores, GenePairFinder,
sankey_plot, chord_plot, CellTypeTriangles,
ParalogSubstitutions, FunctionalEnrichment,
convert_eggnog_to_homologs, GeneTriangles)
from samalg import SAM
import pandas as pd
import anndata
from joblib import dump, load

zf_data = anndata.read_h5ad('....')
pf_data = anndata.read_h5ad('....')

sam1 = SAM(counts = zf_data)
sam1.preprocess_data(filter_genes = False)
sam1.run(batch_key = 'orig.ident',
npcs = 30)

sam2 = SAM(counts = pf_data)
sam2.preprocess_data(filter_genes = False)
sam2.run(npcs = 20)

sams = {'zf': sam1, 'pf': sam2}

sm = SAMAP(sams,
keys = {'zf': 'cell_type', 'pf': 'cell_type'},
f_maps = '...',
save_processed = True)

Thanks very much in advance!

Di

Can you give me a sense of how large the cell type labels are? It would be great if you could show me the number of cells assigned to each label.

Here's tables showing number of cells assigned to each label.

Species zf:
image

Species pf:
image

Another question is, would it be the best if the input cell number of different species are comparable? I am working with 200 cells of one species and 8,000 cells of another species, was thinking about downsampling the 8,000 one.

Thank you!!

I think SAMap can be robust to dataset size disparities, but I would encourage you to try downsampling and check if the results change. I would also encourage changing the (poorly documented) NHS parameter in SAMAP.run like so:

NHS = {'small_dataset_id': 2, 'big_dataset_id': 3}

NHS controls neighborhood size. 3 means that a cell's neighborhood includes cells up to 3 edges away. 2 decreases the neighborhood size, which is probably good for smaller datasets.

Instead of using keys in SAMAP(...),

Can you try using neigh_from_keys in SAMAP.run(...)? You can pass it the same exact value as you're passing to keys.

If you use neigh_from_keys, then NHS is not needed.

Thanks a lot for your suggestions, I will try it.