zktuong/dandelion

Singularity Container Preprocessing Error from container

Closed this issue · 1 comments

Description of the bug

Thank you for the a great package and the container.
I was trying to do preprocessing with the container and I'm having below error message. Could you figure out what causes error? Thanks and let me know if anything.

Minimal reproducible example

singularity run -B $PWD /ext_singularity/sc-dandelion_latest.sif dandelion-preprocess --meta="meta_BCR.csv" --sep="" --file_prefix="all" --filter_to_high_confidence --keep_trailing_hyphen_number

The error message produced by the code above

Error in findNovelAlleles(db, germline_db = igv, v_call = v_call, j_call = j_call,  : 
  Not enough sample sequences were assigned to any germline:
  (1) germline_min is too large or
  (2) sequences names don't match germlines.
Execution halted
tigger-genotype execution took: 0:00:10 secs (Wall clock time)

            Reconstructing heavy chain dmask germline sequences with v_call_genotyped.
Running command: CreateGermlines.py -d tigger/tigger_heavy_igblast_db-pass_genotyped.tsv -g dmask -r tigger/tigger_heavy_igblast_db-pass_genotype.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHD.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHJ.fasta --vf v_call_genotyped

ERROR> Database file tigger/tigger_heavy_igblast_db-pass_genotyped.tsv does not exist.

      Novel allele discovery execution halted.
      Attempting to run tigger-genotype without novel allele discovery.
      Reassigning alleles
Running command: tigger-genotype.R -d tigger/tigger_heavy_igblast_db-pass.tsv -r /share/database/germlines/imgt/human/vdj/imgt_human_IGHV.fasta -n tigger_heavy_igblast_db-pass -N NO -o tigger -f airr

During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 
Error in inferGenotype(db, germline_db = igv, v_call = v_call, seq = sequence_alignment) : 
  The column v_call contains no data
Execution halted
tigger-genotype execution took: 0:00:09 secs (Wall clock time)

            Reconstructing heavy chain dmask germline sequences with v_call_genotyped.
Running command: CreateGermlines.py -d tigger/tigger_heavy_igblast_db-pass_genotyped.tsv -g dmask -r tigger/tigger_heavy_igblast_db-pass_genotype.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHD.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHJ.fasta --vf v_call_genotyped

ERROR> Database file tigger/tigger_heavy_igblast_db-pass_genotyped.tsv does not exist.

     Insufficient contigs for running tigger-genotype. Defaulting to original heavy chain v_calls.
            Reconstructing heavy chain dmask germline sequences with v_call.
Running command: CreateGermlines.py -d tigger/tigger_heavy_igblast_db-pass.tsv -g dmask -r /share/database/germlines/imgt/human/vdj//imgt_human_IGHV.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHD.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHJ.fasta --vf v_call

     START> CreateGermlines
      FILE> tigger_heavy_igblast_db-pass.tsv
GERM_TYPES> dmask
 SEQ_FIELD> sequence_alignment
   V_FIELD> v_call
   D_FIELD> d_call
   J_FIELD> j_call
    CLONED> False

ERROR> File tigger/tigger_heavy_igblast_db-pass.tsv is empty.
            Reconstructing light chain dmask germline sequences with v_call.
Running command: CreateGermlines.py -d tigger/tigger_light_igblast_db-pass.tsv -g dmask -r /share/database/germlines/imgt/human/vdj//imgt_human_IGKV.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGKJ.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGLV.fasta /share/database/germlines/i

     START> CreateGermlines
      FILE> tigger_light_igblast_db-pass.tsv
GERM_TYPES> dmask
 SEQ_FIELD> sequence_alignment
   V_FIELD> v_call
   D_FIELD> d_call
   J_FIELD> j_call
    CLONED> False


PROGRESS> 16:00:55 |####################| 100% (14) 0.0 minVB_BCR_TCR/Run_Dandelion_BCR$ PROGRESS> 16:00:55 |                    |   0% ( 0) 0.0 min

 OUTPUT> tigger_light_igblast_db-pass_germ-pass.tsv
RECORDS> 14
   PASS> 14
   FAIL> 0
    END> CreateGermlines

      For convenience, entries for heavy chain in `v_call` are copied to `v_call_genotyped`.
Traceback (most recent call last):
  File "/share/dandelion_preprocess.py", line 314, in <module>
    main()
  File "/share/dandelion_preprocess.py", line 266, in main
    ddl.pp.reassign_alleles(
  File "/opt/conda/envs/sc-dandelion-container/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py", line 1648, in reassign_alleles
    heavy = load_data(
  File "/opt/conda/envs/sc-dandelion-container/lib/python3.9/site-packages/dandelion/utilities/_utilities.py", line 590, in load_data
    raise FileNotFoundError(
FileNotFoundError: Either input is not of <class 'pandas.core.frame.DataFrame'> or file does not exist.

OS information

Linux

Version information

dandelion==0.3.2

Additional context

No response

Closing this issue for now as we may have resolved the issue. Feel free to reopen if you still encounter a problem.