working environment and anndata issue: 'dict' object has no attribute 'dtype'

Question

working environment and anndata issue: 'dict' object has no attribute 'dtype'

EperLuo opened this issue 2 years ago · 5 comments

hello, I'm having some issues when implement the scGAN.

First I try to build a TensorFlow working environment according to the requirements.txt. But after I generated the h5ad file and run python main.py --param parameters.json --process --train, I got the following error:

reading single cell data from /data1/lep/Workspace/scGAN/output/scgan_test/68kPBMCs.h5ad
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data1/lep/anaconda3/envs/scGAN/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(args, **kwds))
File "/data1/lep/Workspace/scGAN/preprocessing/write_tfrecords.py", line 139, in read_and_serialize
sc_data = GeneMatrix(job_path)
File "/data1/lep/Workspace/scGAN/preprocessing/process_raw.py", line 49, in init
self.read_raw_file()
File "/data1/lep/Workspace/scGAN/preprocessing/process_raw.py", line 107, in read_raw_file
andata = sc.read(self.raw_file)
File "/data1/lep/anaconda3/envs/scGAN/lib/python3.6/site-packages/scanpy/readwrite.py", line 75, in read
backup_url=backup_url, cache=cache)
File "/data1/lep/anaconda3/envs/scGAN/lib/python3.6/site-packages/scanpy/readwrite.py", line 276, in _read
return read_h5ad(filename, backed=backed)
File "/data1/lep/anaconda3/envs/scGAN/lib/python3.6/site-packages/anndata/readwrite/read.py", line 444, in read_h5ad
return AnnData(_read_args_from_h5ad(filename=filename, chunk_size=chunk_size))
File "/data1/lep/anaconda3/envs/scGAN/lib/python3.6/site-packages/anndata/readwrite/read.py", line 494, in _read_args_from_h5ad
return AnnData._args_from_dict(d)
File "/data1/lep/anaconda3/envs/scGAN/lib/python3.6/site-packages/anndata/base.py", line 2144, in _args_from_dict
if key in d_true_keys[true_key].dtype.names:
AttributeError: 'dict' object has no attribute 'dtype'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 97, in
process_files(exp_folders)
File "/data1/lep/Workspace/scGAN/preprocessing/write_tfrecords.py", line 175, in process_files
for res in results:
File "/data1/lep/anaconda3/envs/scGAN/lib/python3.6/multiprocessing/pool.py", line 735, in next
raise value
AttributeError: 'dict' object has no attribute 'dtype'

I generated the h5ad file and run the main.py with anndata 0.6.18 and scanpy 1.2.2 here.

Then I tried to use the docker file in this repository. After I pull the scgan image from https://hub.docker.com/r/fhausmann/scgan I run docker container run -it --rm --gpus all -v /data6/lep/Workspace/scGAN:/scGAN fhausmann/scgan python main.py --param parameters.json --process --train and the same error happend again:

reading single cell data from /scGAN/output/scGAN_test/68kPBMCs.h5ad
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(args, **kwds))
File "/scGAN/preprocessing/write_tfrecords.py", line 139, in read_and_serialize
sc_data = GeneMatrix(job_path)
File "/scGAN/preprocessing/process_raw.py", line 49, in init
self.read_raw_file()
File "/scGAN/preprocessing/process_raw.py", line 107, in read_raw_file
andata = sc.read(self.raw_file)
File "/usr/local/lib/python3.5/dist-packages/scanpy/readwrite.py", line 75, in read
backup_url=backup_url, cache=cache)
File "/usr/local/lib/python3.5/dist-packages/scanpy/readwrite.py", line 276, in _read
return read_h5ad(filename, backed=backed)
File "/usr/local/lib/python3.5/dist-packages/anndata/readwrite/read.py", line 444, in read_h5ad
return AnnData(_read_args_from_h5ad(filename=filename, chunk_size=chunk_size))
File "/usr/local/lib/python3.5/dist-packages/anndata/readwrite/read.py", line 494, in _read_args_from_h5ad
return AnnData._args_from_dict(d)
File "/usr/local/lib/python3.5/dist-packages/anndata/base.py", line 2140, in _args_from_dict
if key in d_true_keys[true_key].dtype.names:
AttributeError: 'dict' object has no attribute 'dtype'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 97, in
process_files(exp_folders)
File "/scGAN/preprocessing/write_tfrecords.py", line 175, in process_files
for res in results:
File "/usr/lib/python3.5/multiprocessing/pool.py", line 695, in next
raise value
AttributeError: 'dict' object has no attribute 'dtype'`

I wonder if there is any problem with my machine or it is the problem about anndata and scanpy. Any suggestion will be very appreciated!

Answer 1 · 2023-01-31T14:42:12.000Z

Hello @EperLuo ,
This issue looks to me as if the h5ad file was generated using a newer AnnData version. Could that be possible?
If not, would it be possible to provide me with a (dummy) h5ad file to investigate the issue in more depth?

Answer 2 · 2023-01-31T16:54:11.000Z

@fhausmann thanks a lot for the quick reply!

As you say it seems like a h5ad file issue, so I switch a new dataset to generate the h5ad file and the new h5ad file work will in the docker. I guess there is something wrong with the original data I used.

But after the data processing here comes another error about the tensorflow:

2023-01-31 16:35:52.683229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22726 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:18:00.0, compute capability: 8.6)
Parameter Count is [ 37841033 ].
2023-01-31 16:37:12.560904: E tensorflow/stream_executor/cuda/cuda_blas.cc:654] failed to run cuBLAS routine cublasSgemmBatched: CUBLAS_STATUS_EXECUTION_FAILED
2023-01-31 16:37:12.560982: E tensorflow/stream_executor/cuda/cuda_blas.cc:2413] Internal: failed BLAS call, see log for details
2023-01-31 16:37:12.561064: I tensorflow/stream_executor/stream.cc:1951] stream 0x12502e20 did not wait for stream: 0x124fd070
2023-01-31 16:37:12.561119: I tensorflow/stream_executor/stream.cc:4724] stream 0x12502e20 did not memcpy device-to-host; source: 0x7f3992aba800
2023-01-31 16:37:12.561159: F tensorflow/core/common_runtime/gpu/gpu_util.cc:296] GPU->CPU Memcpy failed

My CUDA version is 11.4 and the tensorflow I used is 1.8.0, should I change the tensorflow version(or downgrade the CUDA version)? Thank again for your kind help!

Answer 3 · 2023-02-01T12:47:49.000Z

This looks like a memory issue to me. Could it be that there are several processes running on the same GPU? or that you don't have enough RAM available? Maybe you can monitor RAM usage using htop or something similar.

Answer 4 · 2023-02-02T14:59:04.000Z

The way I make things work is to process data in docker environment and train the model in my own anaconda environment. I guess it is something related to my CUDA or GPU type since the tensorflow version here is relatively old. (Maybe you can consider rebuilding the project with tensorflow 2.x to make this work easier to reproduce now.) I can't tell exactly where the problems come out but it can run now anyway.
And thanks a lot for your response!

Answer 5 · 2023-09-07T09:26:29.000Z

@EperLuo
Hello, your discussion has solved many of my problems. Now I have encountered the same problem as you during training. You mentioned using the Conda environment to execute code instead of a Docker. However, there are many requirement packages that cannot be found in the Conda environment. If possible, could you please provide the requirement.txt file for the current Conda environment? Thank you very much.