Fail to filter loom by CellID extracted from Seurat

Question

Fail to filter loom by CellID extracted from Seurat

rtoddler opened this issue 4 years ago · 9 comments

Hi,
I have problem when trying to filter loom with CellID extracted from Seurat object.

When I import the loom file by sample = anndata.read_loom("sample.loom"), I get this warning message: ariable names are not unique. To make them unique, call .var_names_make_unique. Do I need to run sample.var_names_make_unique()?
If I ignore the warning in the above question and continue to load CellID_obs.cvs and filter by sample = sample[sample[np.isin(sample.obs.index,cellID_obs[0])]], I get the following error:

KeyError Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2890 try:
-> 2891 return self._engine.get_loc(casted_key)
2892 except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
in
----> 1 sample = sample[sample[np.isin(sample.obs.index,cellID_obs[0])]]

~/.local/lib/python3.6/site-packages/pandas/core/frame.py in getitem(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]

~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2891 return self._engine.get_loc(casted_key)
2892 except KeyError as err:
-> 2893 raise KeyError(key) from err
2894
2895 if tolerance is not None:

KeyError: 0

Could you help me with this?
Thank you so much!

Answer 1 · 2020-10-20T10:15:28.000Z

I met the same problem.

After running "run sample.var_names_make_unique()", there was no warning message: ariable names are not unique.
However, when I running sample = sample[sample[np.isin(sample.obs.index,cellID_obs[0])]], the same error occurred. It seems that the "sample_one.obs.index" in the loom file cannot match the "cellID_obs". I got my cellID_obs.csv of one sample from a Seurat object composed of multiple single-cell samples.
Thank you. @basilkhuder

Answer 2 · 2020-10-20T16:14:42.000Z

Hi to both of you!

If you could, please open up your cell observation file (either in excel or python) and look to see the name of the column that has the ids. Use this name for subsetting the loom file (so if the column is named "x"):

sample[np.isin(sample.obs.index,cellID_obs["x"])]

Thanks!

Answer 3 · 2020-10-20T16:44:18.000Z

When I import the loom file by sample = anndata.read_loom("sample.loom"), I get this warning message: ariable names are not unique. To make them unique, call .var_names_make_unique. Do I need to run sample.var_names_make_unique()?

Go ahead and make them unique. Check this out for more information.

Answer 4 · 2020-10-21T02:21:28.000Z

Hi to both of you!

If you could, please open up your cell observation file (either in excel or python) and look to see the name of the column that has the ids. Use this name for subsetting the loom file (so if the column is named "x"):
sample[sample[np.isin(sample.obs.index,cellID_obs["x"])]]
Thanks!

Hi,
I have thought about that as well but I got another error (see below.) I wonder if it is because the format of indexes are different in sample.obs.index and cellID_obs.
For exapmel, in cellID_obs, the index is listed as "AAACCCAAGTATGGCG-1" whereas in sample.obs.index, it's "AAACCCAAGTATGGCGx".

IndexError Traceback (most recent call last)
in
----> 1 sample = sample[sample[np.isin(sample.obs.index,cellID_obs["x"])]]

~/.local/lib/python3.6/site-packages/anndata/_core/anndata.py in getitem(self, index)
1085 def getitem(self, index: Index) -> "AnnData":
1086 """Returns a sliced view of the object."""
-> 1087 oidx, vidx = self._normalize_indices(index)
1088 return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
1089

~/.local/lib/python3.6/site-packages/anndata/_core/anndata.py in _normalize_indices(self, index)
1066
1067 def _normalize_indices(self, index: Optional[Index]) -> Tuple[slice, slice]:
-> 1068 return _normalize_indices(index, self.obs_names, self.var_names)
1069
1070 # TODO: this is not quite complete...

~/.local/lib/python3.6/site-packages/anndata/_core/index.py in _normalize_indices(index, names0, names1)
32 index = index[0].values, index[1]
33 ax0, ax1 = unpack_index(index)
---> 34 ax0 = _normalize_index(ax0, names0)
35 ax1 = _normalize_index(ax1, names1)
36 return ax0, ax1

~/.local/lib/python3.6/site-packages/anndata/_core/index.py in _normalize_index(indexer, index)
104 return positions # np.ndarray[int]
105 else:
--> 106 raise IndexError(f"Unknown indexer {indexer!r} of type {type(indexer)}")
107
108

IndexError: Unknown indexer View of AnnData object with n_obs × n_vars = 0 × 32285
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'matrix', 'ambiguous', 'spliced', 'unspliced' of type <class 'anndata._core.anndata.AnnData'>

Answer 5 · 2020-10-21T03:30:50.000Z

Hi, thanks very much for all the replies.
First, I modified the cellID_obs manually, and the modified cellID_obs is shown below.

And my sample_one.obs.index is shown below.

Then I tried different ways for the script "sample_one = sample_one[sample_one[np.isin(sample_one.obs.index,sample_obs["x"])]]".

sample_one = sample_one[sample_one[np.isin(sample_one.obs.index,cellID_obs["x"])]]
sample_one = sample_one[sample_one[np.isin(sample_one.obs.index,cellID_obs["y"])]]
sample_one = sample_one[sample_one[np.isin(sample_one.obs.index,cellID_obs["z"])]]

It seemed the filtering worked when cellID_obs were totally the same as the sample_one.obs.index. However, an index error still happened? Have I missed something or any script was wrong?

Answer 6 · 2020-10-21T04:23:51.000Z

Just realized the typo. It should be:

sample[np.isin(sample.obs.index,cellID_obs["x"])]

Edit: In your case, I'm guessing "z" would be the column name.

Answer 7 · 2020-10-21T05:06:50.000Z

Just realized the typo. It should be:
sample[np.isin(sample.obs.index,cellID_obs["x"])]
Edit: In your case, I'm guessing "z" would be the column name.

Yes, sample_one = sample_one[np.isin(sample_one.obs.index,cellID_obs["z"])] worked.
Thank you.

Answer 8 · 2020-10-21T15:41:21.000Z

Hi all,
Thank you so much for the suggestions. I modified the CellID.csv as JingleW suggested and ran sample[np.isin(sample.obs.index,cellID_obs["x"])]. The problem has been resolved!

Answer 9 · 2021-08-13T10:42:51.000Z

When I import the loom file by sample = anndata.read_loom("sample.loom"), I get this warning message: ariable names are not unique. To make them unique, call .var_names_make_unique. Do I need to run sample.var_names_make_unique()?

Go ahead and make them unique. Check this out for more information.

Hello everyone! First time user of bioinformatics tools and curious to why would we want the variable names to be unique? According to this AnnData page, the variable names are genes ... if more than one cell is expressing a gene, would we not expect repeats? And would it not be more relevant for us to make the obs_names unique since that holds our CellIDs?

I am having siillar issues to the OP - my post #13 outlines everything I have tried ....any clarification would be appreciated!
Thanks!!