Singularity Container Preprocessing Error from container
Closed this issue · 1 comments
m21camby commented
Description of the bug
Thank you for the a great package and the container.
I was trying to do preprocessing with the container and I'm having below error message. Could you figure out what causes error? Thanks and let me know if anything.
Minimal reproducible example
singularity run -B $PWD /ext_singularity/sc-dandelion_latest.sif dandelion-preprocess --meta="meta_BCR.csv" --sep="" --file_prefix="all" --filter_to_high_confidence --keep_trailing_hyphen_number
The error message produced by the code above
Error in findNovelAlleles(db, germline_db = igv, v_call = v_call, j_call = j_call, :
Not enough sample sequences were assigned to any germline:
(1) germline_min is too large or
(2) sequences names don't match germlines.
Execution halted
tigger-genotype execution took: 0:00:10 secs (Wall clock time)
Reconstructing heavy chain dmask germline sequences with v_call_genotyped.
Running command: CreateGermlines.py -d tigger/tigger_heavy_igblast_db-pass_genotyped.tsv -g dmask -r tigger/tigger_heavy_igblast_db-pass_genotype.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHD.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHJ.fasta --vf v_call_genotyped
ERROR> Database file tigger/tigger_heavy_igblast_db-pass_genotyped.tsv does not exist.
Novel allele discovery execution halted.
Attempting to run tigger-genotype without novel allele discovery.
Reassigning alleles
Running command: tigger-genotype.R -d tigger/tigger_heavy_igblast_db-pass.tsv -r /share/database/germlines/imgt/human/vdj/imgt_human_IGHV.fasta -n tigger_heavy_igblast_db-pass -N NO -o tigger -f airr
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_MONETARY failed, using "C"
6: Setting LC_PAPER failed, using "C"
7: Setting LC_MEASUREMENT failed, using "C"
Error in inferGenotype(db, germline_db = igv, v_call = v_call, seq = sequence_alignment) :
The column v_call contains no data
Execution halted
tigger-genotype execution took: 0:00:09 secs (Wall clock time)
Reconstructing heavy chain dmask germline sequences with v_call_genotyped.
Running command: CreateGermlines.py -d tigger/tigger_heavy_igblast_db-pass_genotyped.tsv -g dmask -r tigger/tigger_heavy_igblast_db-pass_genotype.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHD.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHJ.fasta --vf v_call_genotyped
ERROR> Database file tigger/tigger_heavy_igblast_db-pass_genotyped.tsv does not exist.
Insufficient contigs for running tigger-genotype. Defaulting to original heavy chain v_calls.
Reconstructing heavy chain dmask germline sequences with v_call.
Running command: CreateGermlines.py -d tigger/tigger_heavy_igblast_db-pass.tsv -g dmask -r /share/database/germlines/imgt/human/vdj//imgt_human_IGHV.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHD.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGHJ.fasta --vf v_call
START> CreateGermlines
FILE> tigger_heavy_igblast_db-pass.tsv
GERM_TYPES> dmask
SEQ_FIELD> sequence_alignment
V_FIELD> v_call
D_FIELD> d_call
J_FIELD> j_call
CLONED> False
ERROR> File tigger/tigger_heavy_igblast_db-pass.tsv is empty.
Reconstructing light chain dmask germline sequences with v_call.
Running command: CreateGermlines.py -d tigger/tigger_light_igblast_db-pass.tsv -g dmask -r /share/database/germlines/imgt/human/vdj//imgt_human_IGKV.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGKJ.fasta /share/database/germlines/imgt/human/vdj//imgt_human_IGLV.fasta /share/database/germlines/i
START> CreateGermlines
FILE> tigger_light_igblast_db-pass.tsv
GERM_TYPES> dmask
SEQ_FIELD> sequence_alignment
V_FIELD> v_call
D_FIELD> d_call
J_FIELD> j_call
CLONED> False
PROGRESS> 16:00:55 |####################| 100% (14) 0.0 minVB_BCR_TCR/Run_Dandelion_BCR$ PROGRESS> 16:00:55 | | 0% ( 0) 0.0 min
OUTPUT> tigger_light_igblast_db-pass_germ-pass.tsv
RECORDS> 14
PASS> 14
FAIL> 0
END> CreateGermlines
For convenience, entries for heavy chain in `v_call` are copied to `v_call_genotyped`.
Traceback (most recent call last):
File "/share/dandelion_preprocess.py", line 314, in <module>
main()
File "/share/dandelion_preprocess.py", line 266, in main
ddl.pp.reassign_alleles(
File "/opt/conda/envs/sc-dandelion-container/lib/python3.9/site-packages/dandelion/preprocessing/_preprocessing.py", line 1648, in reassign_alleles
heavy = load_data(
File "/opt/conda/envs/sc-dandelion-container/lib/python3.9/site-packages/dandelion/utilities/_utilities.py", line 590, in load_data
raise FileNotFoundError(
FileNotFoundError: Either input is not of <class 'pandas.core.frame.DataFrame'> or file does not exist.
OS information
Linux
Version information
dandelion==0.3.2
Additional context
No response
zktuong commented
Closing this issue for now as we may have resolved the issue. Feel free to reopen if you still encounter a problem.