basilkhuder/Seurat-to-RNA-Velocity

Multiple-Sample Integration for filtering cell ID based off Seurat

cfayx1996 opened this issue · 6 comments

Hello,

Thank you for the well detailed instructions for this they are very helpful. I am rather new to python and I am having a challenging time trying to filter the loom files to match my Seurat object. My Seurat consists of 3 individual samples that are integrated together. I have three separate loom files that were made using Velocyto. I have followed all the instructions in your tutorial up to the filtering step for the loom files. After calling in all the CSV files for the CellIds, UMAP, and cluster ids I moved onto the Multiple-Sample Integration step as my CellID_Obs file has combined 3 samples just like your example table. I use the code:

cellID_obs_sample_one = cellID_obs[cellID_obs_sample_one[0].str.contrains("sample1_")]
cellID_obs_sample_two = cellID_obs[cellID_obs_sample_two[0].str.contrains("sample2_")]
cellID_obs_sample_three = cellID_obs[cellID_obs_sample_three[0].str.contrains("sample3_")]

sample_one = sample_one[np.isin(sample_one.obs.index, cellID_obs_sample_one)]
sample_two = sample_one[np.isin(sample_two.obs.index, cellID_obs_sample_two)]
sample_two = sample_one[np.isin(sample_two.obs.index, cellID_obs_sample_two)]

When I run the first line it errors out with:

cellID_obs_sample_one = sample_obs[cellID_obs_sample_one[0].str.contrains("sample1_")]
Traceback (most recent call last):
File "", line 1, in
NameError: name 'cellID_obs_sample_one' is not defined

If i separate the samples cellID_obs from Seurat into 3 separate lists and run it i still error out:

cellID_obs_sample1 = pd.read_csv("/home/cfay/Documents/cellID_obs_sample1.csv")

sample_one = sample_one[np.isin(sample_one.obs.index,cellID_obs_sample1["x"])]
cellID_obs_sample2 = pd.read_csv("/home/cfay/Documents/cellID_obs_sample2csv")
sample_two = sample_two[np.isin(sample_two.obs.index,cellID_obs_sample2["x"])]
cellID_obs_sample3 = pd.read_csv("/home/cfay/Documents/cellID_obs_sample3.csv")
sample_three = sample_three[np.isin(sample_three.obs.index,cellID_obs_sample3["x"])]
sample_one = sample_one.concatenate(sample_two, sample_three)
Traceback (most recent call last):
File "", line 1, in
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1710, in concatenate
out.obs = concat(
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 834, in obs
self._set_dim_df(value, "obs")
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 783, in _set_dim_df
value_idx = self._prep_dim_index(value.index, attr)
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 810, in _prep_dim_index
value[0], (str, bytes)
File "/home/cfay/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 4101, in getitem
return getitem(key)
IndexError: index 0 is out of bounds for axis 0 with size 0

I figure that I am doing some part of this wrong and wanted to know if you would be able to help me pinpoint the issue as I want to calculate RNA velocity and use my seurat UMAP.
Thank you for your help and consideration!

hi @cfayx1996,
I'm a user like you, but I think I can help you.

I think you need to check your cell IDs first, especially their pattern.

cellID_obs_sample_one = cellID_obs[cellID_obs_sample_one[0].str.contrains("sample1_")]

In this line, str.contains() python function finds given string pattern("sample1_") in the front object(cellID_obs_sample_one[0]).
It's possible that your cell ID pattern is not "sampleX_".

And I think you need to modify the code like this:
cellID_obs_sample_one = cellID_obs[cellID_obs[0].str.contains("sample1_")]

contains is right, not contrains. Probably. @basilkhuder

AAA-3 commented

Hello! I tried attempting this solution (see #13 ) but it did not work for me and produced a long traceback error. @cfayx1996 did yoz have any luck?

Hi @AAA-3,

I presume you are trying to filter your data for RNA Velocity?

I tried to use this tutorial in for sorting in python, but found it was a lot easier to sort and create the object in R since I was analyzing the data with Seurat v4.

If you are using R and Seurat I would be happy to share what I did if that would help!

AAA-3 commented

Hi @AAA-3,

I presume you are trying to filter your data for RNA Velocity?

I tried to use this tutorial in for sorting in python, but found it was a lot easier to sort and create the object in R since I was analyzing the data with Seurat v4.

If you are using R and Seurat I would be happy to share what I did if that would help!

Hi @cfayx1996 Yes I am :) I’d be happy to try your method out as well!! You can email or message through the forum, whichever is convenient: Ali.a.ali@fau.de

Hi @AAA-3,

I presume you are trying to filter your data for RNA Velocity?

I tried to use this tutorial in for sorting in python, but found it was a lot easier to sort and create the object in R since I was analyzing the data with Seurat v4.

If you are using R and Seurat I would be happy to share what I did if that would help!

Hi @cfayx1996 I am having similar trouble - is there a solution using R you could post here? Thank you!

Hi @AAA-3,

I presume you are trying to filter your data for RNA Velocity?

I tried to use this tutorial in for sorting in python, but found it was a lot easier to sort and create the object in R since I was analyzing the data with Seurat v4.

If you are using R and Seurat I would be happy to share what I did if that would help!

Hi! Would you be able to share this with me, too? michael.simoni@pennmedicine.upenn.edu if you'd like to email. Thanks!