Training bais model: ValueError: cannot convert float NaN to integer

Question

Training bais model: ValueError: cannot convert float NaN to integer

YichaoOU opened this issue 9 months ago · 1 comments

Hello,

I'm getting a ValueError using the example data, which is only using rep1.bam. Preprocessing has no problem.

chrombpnet prep splits -c $chromsize -tcr chr1 chr3 chr6 -vcr chr8 chr20 -op $jid/fold_0
chrombpnet prep nonpeaks -g $fasta -p $peak -c  $chromsize -fl $jid/fold_0.json -br $blacklist -o $jid/fold_0

This is my command for bias training

chrombpnet bias pipeline \
        -ibam rep1.bam \
        -d "ATAC" \
        -g  hg38.main.fa \
        -c hg38.sizes \
        -p hg38-blacklist.v2.bed.gz \
        -n output_negatives.bed \
        -fl fold_0.json \
        -b 0.5 \
        -o bias_model/ \
        -fp test \

And this is the output

Estimating enzyme shift in input file
/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py:108: UserWarning: !!! WARNING: Input reads contain chromosomes not in the reference genome fasta provided. Please ensure you are using the correct reference genome. If you are confident you are using the correct reference genome, you can safely ignore this message.
  warnings.warn(colored(msg, 'red'))
Current estimated shift: +0/+0
awk -v OFS="\t" '{if ($6=="+"){print $1,$2+4,$3,$4,$5,$6} else if ($6=="-") {print $1,$2,$3-4,$4,$5,$6}}' | sort -k1,1 | bedtools genomecov -bg -5 -i stdin -g hg38.sizes | LC_COLLATE="C" sort -k1,1 -k2,2n 
Making BedGraph (Filter chromosomes not in reference fasta)
Making Bigwig
non zero bigwig entries in the given chromosome:  4764554
2023-12-07 10:29:31.078161: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /hpcf/lsf/lsf_prod/10.1/linux3.10-glibc2.17-x86_64/lib:/home/yli11/Programs/scPPIN/scPPIN/boost_1_69_0/stage/lib:/home/yli11/Programs/scPPIN/scPPIN/boost_1_69_0
2023-12-07 10:29:31.078192: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
evaluating hyperparameters on the following chromosomes ['chr2', 'chr4', 'chr5', 'chr7', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr21', 'chr22', 'chrX', 'chrY', 'chrM', 'chr8', 'chr20']
Number of non peaks input:  406632
Number of non peaks filtered because the input/output is on the edge:  2
Number of non peaks being used:  406630
Number of non peaks input:  66484
Number of non peaks filtered because the input/output is on the edge:  0
Number of non peaks being used:  66484
Number of peaks input:  511
Number of peaks filtered because the input/output is on the edge:  0
Number of peaks being used:  511
Number of peaks input:  125
Number of peaks filtered because the input/output is on the edge:  0
Number of peaks being used:  125
Traceback (most recent call last):
  File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/bin/chrombpnet", line 8, in <module>
    sys.exit(main())
  File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 38, in main
    pipelines.train_bias_pipeline(args)
  File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/pipelines.py", line 295, in train_bias_pipeline
    find_bias_hyperparams.main(args_copy)
  File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/hyperparameters/find_bias_hyperparams.py", line 75, in main
    peak_cnts, _ = param_utils.get_seqs_cts(genome, bw, peaks, args.inputlen, args.outputlen)
  File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/hyperparameters/param_utils.py", line 50, in get_seqs_cts
    sequence = str(genome[r['chr']][(r['start']+r['summit'] - input_width//2):(r['start'] + r['summit'] + input_width//2)])
  File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/pyfaidx/__init__.py", line 823, in __getitem__
    return self._fa.get_seq(self.name, start + 1, stop)[::step]
  File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/pyfaidx/__init__.py", line 1048, in get_seq
    seq = self.faidx.fetch(name, start, end)
  File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/pyfaidx/__init__.py", line 637, in fetch
    seq = self.from_file(name, start, end)
  File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/pyfaidx/__init__.py", line 649, in from_file
    assert start == int(start)
ValueError: cannot convert float NaN to integer

Do you have any clues?

Thanks,
Yichao

Answer 1 · 2023-12-07T17:54:46.000Z

For anyone who has the problem, I was confused by the original command -p ~/chrombpnet_tutorial/data/peaks_no_blacklist.bed . I somehow thought it is blacklist file, NOT, it should be a peak file.

Solved.
Yichao