imsb-uke/scGAN

Indexing error in Preprocessing step

Closed this issue · 6 comments

I am currently trying to run the vanilla implementation of scGAN on the Fresh 68k PBMCs (Donor A) dataset. I have changed the data format to .h5ad but am facing errors in the preprocessing step:

`reading single cell data from ../exp_dir/name_your_experiment/data.h5ad
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/users/btech/varshney/singeCell/scGAN-master/preprocessing/write_tfrecords.py", line 141, in read_and_serialize
sc_data.apply_preprocessing()
File "/users/btech/varshney/singeCell/scGAN-master/preprocessing/process_raw.py", line 373, in apply_preprocessing
self.clustering()
File "/users/btech/varshney/singeCell/scGAN-master/preprocessing/process_raw.py", line 153, in clustering
sc.pp.recipe_zheng17(clustered)
File "/users/btech/varshney/.local/lib/python3.5/site-packages/scanpy/preprocessing/recipes.py", line 107, in recipe_zheng17
adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False)
File "/users/btech/varshney/.local/lib/python3.5/site-packages/scanpy/preprocessing/simple.py", line 348, in filter_genes_dispersion
- disp_median_bin[df['mean_bin']].values))
File "/users/btech/varshney/.local/lib/python3.5/site-packages/pandas/core/series.py", line 911, in getitem
return self._get_with(key)
File "/users/btech/varshney/.local/lib/python3.5/site-packages/pandas/core/series.py", line 953, in _get_with
return self.reindex(key)
File "/users/btech/varshney/.local/lib/python3.5/site-packages/pandas/core/series.py", line 3738, in reindex
return super(Series, self).reindex(index=index, **kwargs)
File "/users/btech/varshney/.local/lib/python3.5/site-packages/pandas/core/generic.py", line 4356, in reindex
fill_value, copy).finalize(self)
File "/users/btech/varshney/.local/lib/python3.5/site-packages/pandas/core/generic.py", line 4369, in _reindex_axes
tolerance=tolerance, method=method)
File "/users/btech/varshney/.local/lib/python3.5/site-packages/pandas/core/indexes/category.py", line 503, in reindex
raise ValueError("cannot reindex with a non-unique indexer")
ValueError: cannot reindex with a non-unique indexer
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "main.py", line 94, in
process_files(exp_folders)
File "/users/btech/varshney/singeCell/scGAN-master/preprocessing/write_tfrecords.py", line 175, in process_files
for res in results:
File "/usr/lib/python3.5/multiprocessing/pool.py", line 695, in next
raise value
ValueError: cannot reindex with a non-unique indexer`

For debugging, I also replaced the code for Zheng17 preprocessing from scanpy's documentation but it seems the error persists in the execution of line : filter_result = sc.pp.highly_variable_genes(clustered.X, flavor='cell_ranger', n_top_genes = 1000).

@marouf-git @pierremac @zeehio Kindly help me resolve the error

Hello!
Thanks for your interest!
I'm currently at a conference, so not really able to look into this too closely but I suspect the issue is that the variables or observations on your data object are not unique.
You can try using the methods var_names_make_unique() and obs_names_make_unique() on the days object after loading it from the h5ad file.
Please let me know if that helps.

I apologise for the inconvenience.

  1. The dataset is Fresh 68k PBMCs (Donor A) dataset (the same one referenced in your README), as I am currently stuck at the vanilla version itself. I have also tried on other datasets (such as Frozen PBMCs (Donor A)) but the error persists.
  2. The read_raw_file function in process_raw.py already invokes var_names_make unique. I had tried your suggestions of calling obs_names_make_unique() there also but it didn't resolve the issue. (Note that we are already using the methods var_names_make_unique() and obs_names_make_unique() when converting the data to h5ad format)

If possible, kindly run the vanilla code at your end (or ask the other maintainers to do the same) and verify whether the errors are actually occurring or not.

Ok, thank you.
I'm sorry I'm not really able to give you more support at the moment as I will be away from the office until the middle of next week.
However, I notice that, that you're probably not using the docker container we provided.
You don't strictly have to use it, but be careful with the package versions you are using.
Specifically, Scanpy is constantly evolving and not always backward-compatible.
As mentioned in the readme file, we only tested the code with those packages: Python: 3.6, Tensorflow: 1.8, Scanpy: 1.2.2, Anndata: 0.6.5, Pandas: 0.22.0, Numpy: 1.14.3, Scipy: 1.1.0

Please make sure that you are using those, and if it does not fix the issue, let me know, and I'll try to reproduce and find a fix as soon as I am back to my office.

@pvarshney1729 did running with those specific package versions (or with the provided dockerfile) fix the issue?

Without further info, I'm closing this for now.
Please re-open if the problem persists.

@pierremac I was able to fix the issue by using the package versions you specified. But maybe it is time to update the code to support the latest versions for the packages? I see the issue when I use the latest available versions for the packages.