Training bais model: ValueError: cannot convert float NaN to integer
YichaoOU opened this issue · 1 comments
YichaoOU commented
Hello,
I'm getting a ValueError using the example data, which is only using rep1.bam. Preprocessing has no problem.
chrombpnet prep splits -c $chromsize -tcr chr1 chr3 chr6 -vcr chr8 chr20 -op $jid/fold_0
chrombpnet prep nonpeaks -g $fasta -p $peak -c $chromsize -fl $jid/fold_0.json -br $blacklist -o $jid/fold_0
This is my command for bias training
chrombpnet bias pipeline \
-ibam rep1.bam \
-d "ATAC" \
-g hg38.main.fa \
-c hg38.sizes \
-p hg38-blacklist.v2.bed.gz \
-n output_negatives.bed \
-fl fold_0.json \
-b 0.5 \
-o bias_model/ \
-fp test \
And this is the output
Estimating enzyme shift in input file
/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py:108: UserWarning: !!! WARNING: Input reads contain chromosomes not in the reference genome fasta provided. Please ensure you are using the correct reference genome. If you are confident you are using the correct reference genome, you can safely ignore this message.
warnings.warn(colored(msg, 'red'))
Current estimated shift: +0/+0
awk -v OFS="\t" '{if ($6=="+"){print $1,$2+4,$3,$4,$5,$6} else if ($6=="-") {print $1,$2,$3-4,$4,$5,$6}}' | sort -k1,1 | bedtools genomecov -bg -5 -i stdin -g hg38.sizes | LC_COLLATE="C" sort -k1,1 -k2,2n
Making BedGraph (Filter chromosomes not in reference fasta)
Making Bigwig
non zero bigwig entries in the given chromosome: 4764554
2023-12-07 10:29:31.078161: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /hpcf/lsf/lsf_prod/10.1/linux3.10-glibc2.17-x86_64/lib:/home/yli11/Programs/scPPIN/scPPIN/boost_1_69_0/stage/lib:/home/yli11/Programs/scPPIN/scPPIN/boost_1_69_0
2023-12-07 10:29:31.078192: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
evaluating hyperparameters on the following chromosomes ['chr2', 'chr4', 'chr5', 'chr7', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr21', 'chr22', 'chrX', 'chrY', 'chrM', 'chr8', 'chr20']
Number of non peaks input: 406632
Number of non peaks filtered because the input/output is on the edge: 2
Number of non peaks being used: 406630
Number of non peaks input: 66484
Number of non peaks filtered because the input/output is on the edge: 0
Number of non peaks being used: 66484
Number of peaks input: 511
Number of peaks filtered because the input/output is on the edge: 0
Number of peaks being used: 511
Number of peaks input: 125
Number of peaks filtered because the input/output is on the edge: 0
Number of peaks being used: 125
Traceback (most recent call last):
File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/bin/chrombpnet", line 8, in <module>
sys.exit(main())
File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 38, in main
pipelines.train_bias_pipeline(args)
File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/pipelines.py", line 295, in train_bias_pipeline
find_bias_hyperparams.main(args_copy)
File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/hyperparameters/find_bias_hyperparams.py", line 75, in main
peak_cnts, _ = param_utils.get_seqs_cts(genome, bw, peaks, args.inputlen, args.outputlen)
File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/hyperparameters/param_utils.py", line 50, in get_seqs_cts
sequence = str(genome[r['chr']][(r['start']+r['summit'] - input_width//2):(r['start'] + r['summit'] + input_width//2)])
File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/pyfaidx/__init__.py", line 823, in __getitem__
return self._fa.get_seq(self.name, start + 1, stop)[::step]
File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/pyfaidx/__init__.py", line 1048, in get_seq
seq = self.faidx.fetch(name, start, end)
File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/pyfaidx/__init__.py", line 637, in fetch
seq = self.from_file(name, start, end)
File "/hpcf/authorized_apps/rhel8_apps/conda3/202303/install/envs/chrombpnet/lib/python3.8/site-packages/pyfaidx/__init__.py", line 649, in from_file
assert start == int(start)
ValueError: cannot convert float NaN to integer
Do you have any clues?
Thanks,
Yichao
YichaoOU commented
For anyone who has the problem, I was confused by the original command -p ~/chrombpnet_tutorial/data/peaks_no_blacklist.bed
. I somehow thought it is blacklist file, NOT, it should be a peak file.
Solved.
Yichao