zktuong/dandelion

Manual preprocessing with Pandas>1.4+

Closed this issue · 3 comments

Description of the bug

Hi Zewen! Great package I really have been enjoying it as I try to prepare a pipeline for some mouse scRNAseq with BCR/TCR paired libraries. I comment here to say that in reference to #180 where you enforce pandas<1.5 in your singularity environment, it seems that the issue lies with the new requirement that Pandas>1.5 raises a Value Error when an index is defined by a set.

Minimal reproducible example

import dandelion as ddl
sample = 'path/to/fasta'
ddl.pp.assign_isotypes((sample),org= "mouse", plot=True, save_plot= True)

The error message produced by the code above

ValueError                                Traceback (most recent call last)
----> 1 ddl.pp.assign_isotypes((samples[0]),filename_prefix=bcr_filename_prefixes,org= "mouse", plot=True, save_plot= True)

~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py in ?(fastas, fileformat, org, correct_c_call, correction_dict, plot, save_plot, show_plot, figsize, blastdb, allele, filename_prefix, verbose)
    923 
    924     logg.info("Assign isotypes \n")
    925 
    926     for i in range(0, len(fastas)):
--> 927         assign_isotype(
    928             fastas[i],
    929             fileformat=fileformat,
    930             org=org,

~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py in ?(fasta, fileformat, org, evalue, correct_c_call, correction_dict, plot, save_plot, show_plot, figsize, blastdb, allele, filename_prefix, verbose)
    862     # move and rename
    863     move_to_tmp(fasta, filename_prefix)
    864     make_all(fasta, filename_prefix, loci="ig")
    865     rename_dandelion(fasta, filename_prefix, endswith=out_ex, subdir="tmp")
--> 866     update_j_multimap(fasta, filename_prefix)

~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py in ?(data, filename_prefix)
   6483             "support_multimappers",
   6484         ]
   6485         check_multimapper(filePath0, filePath2)
   6486         if filePath0 is not None:
-> 6487             jmulti = multimapper(filePath0)
   6488             if filePath1 is not None:
   6489                 dbpass = load_data(filePath1)
   6490                 for col in jmm_transfer_cols:

~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py in ?(filename)
   6383     df = pd.read_csv(filename, delimiter="\t")
   6384     df_new = df.loc[
   6385         df["j_support"] < 1e-3, :
   6386     ]  # maybe not needing to filter if j_support has already been filtered
-> 6387     mapped = pd.DataFrame(
   6388         index=set(df_new["sequence_id"]),
   6389         columns=[
   6390             "multimappers",

~/mambaforge/envs/scrnaseq_env1/lib/python3.9/site-packages/pandas/core/frame.py in ?(self, data, index, columns, dtype, copy)
    669         manager = get_option("mode.data_manager")
    670 
    671         # GH47215
    672         if index is not None and isinstance(index, set):
--> 673             raise ValueError("index cannot be a set")
    674         if columns is not None and isinstance(columns, set):
    675             raise ValueError("columns cannot be a set")
    676 

ValueError: index cannot be a set

OS information

MacOS

Version information

dandelion==0.3.1 pandas==2.0.1 numpy==1.24.3 matplotlib==3.7.1 networkx==3.1 scipy==1.10.1

Additional context

No response

For the time being I am cloning my conda env and downgrading pandas/numpy to confirm the problem is fixed.

hi @bpr4242 thanks for the interest in this package! Actually I just updated the pypi version to 0.3.2 yesterday and it included an automatic fix by dependabot

So if you reinstall and use pandas >=2, it should still work

Perfect! Working well now, thank you!