[BUG] rsc.pp.scrublet: ValueError: Number of components should not be greater thanthe number of columns in the data
fuzh25 opened this issue · 2 comments
Describe the bug
Running the code "rsc.pp.scrublet(adata,n_prin_comps=30, random_state=1000,batch_key='batch')" results in an error:ValueError: Number of components should not be greater thanthe number of columns in the data
Steps/Code to reproduce bug
rsc.pp.scrublet(adata,n_prin_comps=30, random_state=1000,batch_key='batch')
Expected behavior
This will occur when the sample size is large.
Environment details (please complete the following information):
Python 3.10
scanpy version: 1.10.1
rapids_singlecell version: 0.10.4
Additional context
Add any other context about the problem here.
The specific operation status is as follows:
rsc.pp.scrublet(adata,n_prin_comps=30, random_state=1000,batch_key='batch')
Running Scrublet
Embedding transcriptomes using PCA...
Automatically set threshold at doublet score = 0.02
Detected doublet rate = 82.2%
Estimated detectable doublet fraction = 96.5%
Overall doublet rate:
Expected = 5.0%
Estimated = 85.1%
Embedding transcriptomes using PCA...
Automatically set threshold at doublet score = 0.03
Detected doublet rate = 77.6%
Estimated detectable doublet fraction = 92.7%
Overall doublet rate:
Expected = 5.0%
Estimated = 83.7%
Embedding transcriptomes using PCA...
Traceback (most recent call last):
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
rsc.pp.scrublet(adata,n_prin_comps=30, random_state=1000,batch_key='batch')
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/legacy_api_wrap/init.py", line 80, in fn_compatible
return fn(*args_all, **kw)
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/init.py", line 259, in scrublet
scrubbed = [
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/init.py", line 260, in
_run_scrublet(
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/init.py", line 230, in _run_scrublet
ad_obs = _scrublet_call_doublets(
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/init.py", line 433, in _scrublet_call_doublets
pipeline.pca(scrub, n_prin_comps=n_prin_comps, random_state=scrub._random_state)
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/pipeline.py", line 83, in pca
pca = PCA(n_components=n_prin_comps, random_state=random_state).fit(X_obs)
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "pca.pyx", line 443, in cuml.decomposition.pca.PCA.fit
ValueError: Number of components should not be greater thanthe number of columns in the data
I see the error. But I don't know if thats something that has to be fixed. The subset of data you try to analyse has very little genes/features less than 30.