Tutorial question on filtering out by mitochondrial content
Closed this issue · 5 comments
Describe the issue
Hello. In my python programming end, the code to filter out the cells with high mitochondrial contents cannot run successfully. Look forward to learning from your insights in this regard. Thanks!
What is the exact command that was run?
# for each cell compute fraction of counts in mito genes vs. all genes
# the `.A1` is only necessary as X is sparse (to transform to a dense array after summing)
adata.obs['percent_mito'] = np.sum(
adata[:, mito_genes].X, axis=1).A1 / np.sum(adata.X, axis=1).A1
Command output (with --verbose
flag)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[60], line 3
1 # For each cell, compute fraction of counts in mito genes vs. all genes
2 # the `.A1` is only necessary as X is sparse (to transform to a dense array after summing)
----> 3 adata.obs['percent_mito'] = np.sum(adata[:, mito_genes].X, axis=1).A1 / np.sum(adata.X, axis=1).A1
File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anndata/_core/anndata.py:1100, in AnnData.__getitem__(self, index)
1098 def __getitem__(self, index: Index) -> "AnnData":
1099 """Returns a sliced view of the object."""
-> 1100 oidx, vidx = self._normalize_indices(index)
1101 return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anndata/_core/anndata.py:1081, in AnnData._normalize_indices(self, index)
1080 def _normalize_indices(self, index: Optional[Index]) -> Tuple[slice, slice]:
-> 1081 return _normalize_indices(index, self.obs_names, self.var_names)
File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anndata/_core/index.py:33, in _normalize_indices(index, names0, names1)
31 ax0, ax1 = unpack_index(index)
32 ax0 = _normalize_index(ax0, names0)
---> 33 ax1 = _normalize_index(ax1, names1)
34 return ax0, ax1
File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anndata/_core/index.py:98, in _normalize_index(indexer, index)
96 if np.any(positions < 0):
97 not_found = indexer[positions < 0]
---> 98 raise KeyError(
99 f"Values {list(not_found)}, from {list(indexer)}, "
100 "are not valid obs/ var names or indices."
101 )
102 return positions # np.ndarray[int]
103 else:
KeyError: "Values ['ENSMUSG00000064336', 'ENSMUSG00000064337', 'ENSMUSG00000064338', 'ENSMUSG00000064339', 'ENSMUSG00000064340', 'ENSMUSG00000064341', 'ENSMUSG00000064342', 'ENSMUSG00000064343', 'ENSMUSG00000064344', 'ENSMUSG00000064345', 'ENSMUSG00000064346', 'ENSMUSG00000064347', 'ENSMUSG00000064348', 'ENSMUSG00000064349', 'ENSMUSG00000064350', 'ENSMUSG00000064351', 'ENSMUSG00000064352', 'ENSMUSG00000064353', 'ENSMUSG00000064354', 'ENSMUSG00000064355', 'ENSMUSG00000064356', 'ENSMUSG00000064357', 'ENSMUSG00000064358', 'ENSMUSG00000064359', 'ENSMUSG00000064360', 'ENSMUSG00000064361', 'ENSMUSG00000065947', 'ENSMUSG00000064363', 'ENSMUSG00000064364', 'ENSMUSG00000064365', 'ENSMUSG00000064366', 'ENSMUSG00000064367', 'ENSMUSG00000064368', 'ENSMUSG00000064369', 'ENSMUSG00000064370', 'ENSMUSG00000064371', 'ENSMUSG00000064372'], from ['ENSMUSG00000064336', 'ENSMUSG00000064337', 'ENSMUSG00000064338', 'ENSMUSG00000064339', 'ENSMUSG00000064340', 'ENSMUSG00000064341', 'ENSMUSG00000064342', 'ENSMUSG00000064343', 'ENSMUSG00000064344', 'ENSMUSG00000064345', 'ENSMUSG00000064346', 'ENSMUSG00000064347', 'ENSMUSG00000064348', 'ENSMUSG00000064349', 'ENSMUSG00000064350', 'ENSMUSG00000064351', 'ENSMUSG00000064352', 'ENSMUSG00000064353', 'ENSMUSG00000064354', 'ENSMUSG00000064355', 'ENSMUSG00000064356', 'ENSMUSG00000064357', 'ENSMUSG00000064358', 'ENSMUSG00000064359', 'ENSMUSG00000064360', 'ENSMUSG00000064361', 'ENSMUSG00000065947', 'ENSMUSG00000064363', 'ENSMUSG00000064364', 'ENSMUSG00000064365', 'ENSMUSG00000064366', 'ENSMUSG00000064367', 'ENSMUSG00000064368', 'ENSMUSG00000064369', 'ENSMUSG00000064370', 'ENSMUSG00000064371', 'ENSMUSG00000064372'], are not valid obs/ var names or indices."
What are the mito_genes that you're using?
Print out the mito_genes
What are the mito_genes that you're using?
Print out the mito_genes
mito_ensembl_ids = sc.queries.mitochondrial_genes("mmusculus", attrname="ensembl_gene_id")
mito_genes = mito_ensembl_ids["ensembl_gene_id"].values
mito_genes
array(['ENSMUSG00000064336', 'ENSMUSG00000064337', 'ENSMUSG00000064338',
'ENSMUSG00000064339', 'ENSMUSG00000064340', 'ENSMUSG00000064341',
'ENSMUSG00000064342', 'ENSMUSG00000064343', 'ENSMUSG00000064344',
'ENSMUSG00000064345', 'ENSMUSG00000064346', 'ENSMUSG00000064347',
'ENSMUSG00000064348', 'ENSMUSG00000064349', 'ENSMUSG00000064350',
'ENSMUSG00000064351', 'ENSMUSG00000064352', 'ENSMUSG00000064353',
'ENSMUSG00000064354', 'ENSMUSG00000064355', 'ENSMUSG00000064356',
'ENSMUSG00000064357', 'ENSMUSG00000064358', 'ENSMUSG00000064359',
'ENSMUSG00000064360', 'ENSMUSG00000064361', 'ENSMUSG00000065947',
'ENSMUSG00000064363', 'ENSMUSG00000064364', 'ENSMUSG00000064365',
'ENSMUSG00000064366', 'ENSMUSG00000064367', 'ENSMUSG00000064368',
'ENSMUSG00000064369', 'ENSMUSG00000064370', 'ENSMUSG00000064371',
'ENSMUSG00000064372'], dtype=object)
Appreciate it!
OK, but those IDs don't exist in your anndata object. You'll need to convert them to whatever form is in your anndata object (maybe they have a different id or maybe gene names are used).
OK, but those IDs don't exist in your anndata object. You'll need to convert them to whatever form is in your anndata object (maybe they have a different id or maybe gene names are used).
Many thanks for your insights!
You are right. I write the codes below and no error occurs.
import re
strings = adata.var_names
adata.var_names = [re.sub(r"\.\d+", "", x) for x in strings]
print("\nDisplay 30 modified gene names.")
print(adata.var_names[:30])
Display 30 modified gene names.
Index(['ENSMUSG00000102693', 'ENSMUSG00000064842', 'ENSMUSG00000051951',
'ENSMUSG00000102851', 'ENSMUSG00000103377', 'ENSMUSG00000104017',
'ENSMUSG00000103025', 'ENSMUSG00000089699', 'ENSMUSG00000103201',
'ENSMUSG00000103147', 'ENSMUSG00000103161', 'ENSMUSG00000102331',
'ENSMUSG00000102348', 'ENSMUSG00000102592', 'ENSMUSG00000088333',
'ENSMUSG00000102343', 'ENSMUSG00000025900', 'ENSMUSG00000102948',
'ENSMUSG00000104123', 'ENSMUSG00000025902', 'ENSMUSG00000104238',
'ENSMUSG00000102269', 'ENSMUSG00000096126', 'ENSMUSG00000103003',
'ENSMUSG00000104328', 'ENSMUSG00000102735', 'ENSMUSG00000098104',
'ENSMUSG00000102175', 'ENSMUSG00000088000', 'ENSMUSG00000103265'],
dtype='object')
It would be nice if the tutorial will add an ID conversion step, so that when learning and following your python codes, we won't meet any mismatch between the IDs in AnnData object and the queried IDs in the database.
Great! And unfortunately, those tutorials haven't been updated in a long time but, because we just released the new version of kallisto (0.50.0) and bustools (0.43.0), they will undergo complete reconstruction soon.