basilkhuder/Seurat-to-RNA-Velocity

Variable names are not unique and 'cellID_obs' is not defined

AAA-3 opened this issue · 2 comments

AAA-3 commented

Thanks @basilkhuder for this amazing tutorial!!

I run the python code (see below) and I get the following output error:

Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Traceback (most recent call last):
  File "/home/ali/Dokumente/RPractise/Velocity/PythonCodes/Step_2_scVelo.py", line 34, in <module>
    cellID_obs_WT3 = cellID_obs[cellID_obs_WT3[0].str.contains("WT3_WT3_")]
NameError: name 'cellID_obs_WT3' is not defined

Reading #4, I changed WT3 = WT3[np.isin(WT3.obs.index, cellID_obs_WT3)] to WT3 = WT3[np.isin(WT3.obs.index, cellID_obs_WT3[x])] which did not work and gave me errors relating to pandas (Key error 0=. I implemented this potential solution in my code but I do not know if Python is implementing it properly since I get no output for my WT3.var_names line Anyone have any suggestions?

I also do not know how to go about addressing the NameError: name 'cellID_obs_WT3' is not defined error. From #9, I changed cellID_obs_WT3 = cellID_obs[cellID_obs_WT3[0].str.contains("WT3_WT3_")] to cellID_obs_WT3 = cellID_obs[cellID_obs[0].str.contains("WT3_WT3_")] and also got the same pandas errors.

Anyone got any tips?

Python script

WT3 = anndata.read_loom("/home/ali/Dokumente/RPractise/E18.5_rawdata/E18.5_raw_outputs/221929_WT3/velocyto/221929_WT3.loom")
WT3.var_names_make_unique()
WT4 = anndata.read_loom("/home/ali/Dokumente/RPractise/E18.5_rawdata/E18.5_raw_outputs/222863_WT4/velocyto/222863_WT4.loom")
WT4.var_names_make_unique()
KO4 = anndata.read_loom("/home/ali/Dokumente/RPractise/E18.5_rawdata/E18.5_raw_outputs/222862_KO4/velocyto/222862_KO4.loom")
KO4.var_names_make_unique()
KO5 = anndata.read_loom("/home/ali/Dokumente/RPractise/E18.5_rawdata/E18.5_raw_outputs/222864_KO5/velocyto/222864_KO5.loom")
KO5.var_names_make_unique()

WT3.var_names
WT4.var_names
KO4.var_names
KO5.var_names

cellID_obs = pd.read_csv("/home/ali/Dokumente/RPractise/E18.5/MTW Expected E18.5/cellID_obs.csv")
TSNE_cord = pd.read_csv("/home/ali/Dokumente/RPractise/E18.5/MTW Expected E18.5/cell_embeddings.csv")
cell_clusters = pd.read_csv("/home/ali/Dokumente/RPractise/E18.5/MTW Expected E18.5/clusters.csv")

#integration
cellID_obs_WT3 = cellID_obs[cellID_obs_WT3[0].str.contains("WT3_WT3_")]
cellID_obs_WT4 = cellID_obs[cellID_obs_WT4[0].str.contains("WT4_")]
cellID_obs_KO4 = cellID_obs[cellID_obs_KO4[0].str.contains("KO4_")]
cellID_obs_KO5 = cellID_obs[cellID_obs_KO5[0].str.contains("KO5_")]

WT3 = WT3[np.isin(WT3.obs.index, cellID_obs_WT3)]
WT4 = WT4[np.isin(WT4.obs.index, cellID_obs_WT4)]
KO4 = KO4[np.isin(KO4.obs.index, cellID_obs_KO4)]
KO5 = KO5[np.isin(KO5.obs.index, cellID_obs_KO5)]

sample_one = WT3.concatenate(WT4, KO4, KO5)

Cell ID examples

x
WT4_TGCGGGTAGTCCGGTC
KO4_GTTAAGCCATACCATG
KO5_TTTCCTCAGATCCCAT
WT3_WT3_AACCATGCAGCCTTGG
AAA-3 commented

UPDATE: Ran my updated code on jupyter notebook with the solutions listed above and here is the outcome... I do not know why we need to make the gene names different since we are looking only at cell ID but I did it any way (not shown).

Loading loom files and packages
image

loom file index:
image

loading seurat files (NOTE: manually changed Cell IDs) and attempting to merge
image

Error:
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_34687/274049558.py in <module>
      1 #integration...originally cellID_obs[cellID_obs_WT3[0]
----> 2 cellID_obs_WT3 = cellID_obs[cellID_obs[z].str.contains("221929_WT3:")]
      3 cellID_obs_WT4 = cellID_obs[cellID_obs[z].str.contains("222863_WT4:")]
      4 cellID_obs_KO4 = cellID_obs[cellID_obs[z].str.contains("222862_KO4:")]
      5 cellID_obs_KO5 = cellID_obs[cellID_obs[z].str.contains("222864_KO5:")]

NameError: name 'z' is not defined

Changing the code to the suggestion in #9 produced the following error:
image

Anyone know how to proceed?

AAA-3 commented

In case anyone else runs into this issue, I just manually went in and created a new .CSV with just the one column. That way, if python is selecting the whole df, there is no way it selects anything else...kind of a long way round since the code is supposed to do it for me but I couldn't find why it was not cooperating.