Manual preprocessing with Pandas>1.4+
Closed this issue · 3 comments
bpr4242 commented
Description of the bug
Hi Zewen! Great package I really have been enjoying it as I try to prepare a pipeline for some mouse scRNAseq with BCR/TCR paired libraries. I comment here to say that in reference to #180 where you enforce pandas<1.5 in your singularity environment, it seems that the issue lies with the new requirement that Pandas>1.5 raises a Value Error when an index is defined by a set.
Minimal reproducible example
import dandelion as ddl
sample = 'path/to/fasta'
ddl.pp.assign_isotypes((sample),org= "mouse", plot=True, save_plot= True)
The error message produced by the code above
ValueError Traceback (most recent call last)
----> 1 ddl.pp.assign_isotypes((samples[0]),filename_prefix=bcr_filename_prefixes,org= "mouse", plot=True, save_plot= True)
~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py in ?(fastas, fileformat, org, correct_c_call, correction_dict, plot, save_plot, show_plot, figsize, blastdb, allele, filename_prefix, verbose)
923
924 logg.info("Assign isotypes \n")
925
926 for i in range(0, len(fastas)):
--> 927 assign_isotype(
928 fastas[i],
929 fileformat=fileformat,
930 org=org,
~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py in ?(fasta, fileformat, org, evalue, correct_c_call, correction_dict, plot, save_plot, show_plot, figsize, blastdb, allele, filename_prefix, verbose)
862 # move and rename
863 move_to_tmp(fasta, filename_prefix)
864 make_all(fasta, filename_prefix, loci="ig")
865 rename_dandelion(fasta, filename_prefix, endswith=out_ex, subdir="tmp")
--> 866 update_j_multimap(fasta, filename_prefix)
~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py in ?(data, filename_prefix)
6483 "support_multimappers",
6484 ]
6485 check_multimapper(filePath0, filePath2)
6486 if filePath0 is not None:
-> 6487 jmulti = multimapper(filePath0)
6488 if filePath1 is not None:
6489 dbpass = load_data(filePath1)
6490 for col in jmm_transfer_cols:
~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py in ?(filename)
6383 df = pd.read_csv(filename, delimiter="\t")
6384 df_new = df.loc[
6385 df["j_support"] < 1e-3, :
6386 ] # maybe not needing to filter if j_support has already been filtered
-> 6387 mapped = pd.DataFrame(
6388 index=set(df_new["sequence_id"]),
6389 columns=[
6390 "multimappers",
~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/pandas/core/frame.py in ?(self, data, index, columns, dtype, copy)
669 manager = get_option("mode.data_manager")
670
671 # GH47215
672 if index is not None and isinstance(index, set):
--> 673 raise ValueError("index cannot be a set")
674 if columns is not None and isinstance(columns, set):
675 raise ValueError("columns cannot be a set")
676
ValueError: index cannot be a set
OS information
MacOS
Version information
dandelion==0.3.1 pandas==2.0.1 numpy==1.24.3 matplotlib==3.7.1 networkx==3.1 scipy==1.10.1
Additional context
No response
bpr4242 commented
For the time being I am cloning my conda env and downgrading pandas/numpy to confirm the problem is fixed.
zktuong commented
bpr4242 commented
Perfect! Working well now, thank you!