Input file shifts inconsistent
monikaheinzl opened this issue · 5 comments
Hi,
I have a similar issue as in #153 and #169, while training the bias model for my Drosophila DNase-seq data. I have mapped my raw fastq files with Bowtie and MACS2, and I didn’t do any shifting to the data. There is also no inconsistency between the genomic versions of the BAM and BED files (see a screenshot later). Still, I get the following error:
Traceback (most recent call last):
File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 179, in <module>
main()
File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 38, in main
pipelines.train_bias_pipeline(args)
File “/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/pipelines.py", line 278, in train_bias_pipeline
reads_to_bigwig.main(args)
File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/reads_to_bigwig.py", line 96, in main
plus_shift, minus_shift = auto_shift_detect.compute_shift(args.input_bam_file,
File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py", line 234, in compute_shift
plus_shift, minus_shift = compute_shift_DNASE(ref_plus_pwms, ref_minus_pwms, plus_pwm, minus_pwm)
File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py", line 211, in compute_shift_DNASE
raise ValueError("Input file shifts inconsistent. Please post an Issue")
ValueError: Input file shifts inconsistent. Please post an Issue
Here is also my command for training the bias model:
chrombpnet bias pipeline \
-ibam input.bam \
-d "DNASE" \
-g $genome \
-c $chrom.sizes \
-p $peak_file \
-n $negatives_file \
-fl fold_0.json \
-b 0.8 \
-o $outfolder/ \
-fp bias_model
As suggested in issue #153, I have also generated the PWM for my BAM file:
Here is also an example of my peak file:
chr2L 5183 6872 peak_1 516 . 5.13902 51.60207 48.89282 662
chr2L 18506 19124 peak_2 147 . 2.66881 14.75620 12.99561 447
chr2L 19642 20053 peak_3 64 . 1.98697 6.48809 5.02063 247
chr2L 21603 21939 peak_4 79 . 2.12766 7.90964 6.38455 191
chr2L 34039 34352 peak_5 123 . 2.59835 12.32056 10.64090 152
chr2L 35402 35815 peak_6 111 . 2.51643 11.10707 9.46780 222
chr2L 41896 42336 peak_7 51 . 1.86441 5.16313 3.75423 229
chr2L 45892 47619 peak_8 218 . 3.82924 21.89166 19.91631 1413
chr2L 52497 54348 peak_9 145 . 3.19574 14.50274 12.75090 543
Many thanks for your help,
Monika
Hello Monika,
Can you post the commands you used for generating the png?
I adapted them from issue #153. But here they are:
samtools view -b input.bam chr2L > out.bam
samtools view -b -F796 -@50 out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chrom.sizes | bedtools sort -i stdin > tmp2
bedGraphToBigWig tmp2 $chrom.sizes unstranded.bw
python build_pwm_from_bigwig.py -i unstranded.bw -g $genome -o DHS_no_shift -cr "chr2L" -c $chrom.sizes
Hello @monikaheinzl,
DNase I cleavage logo is known to be pretty variable -
I was expecting to see something closer to any of these representations. But the PWM your showing is very different. So can you check for the following (1) if there is some problem with the build or preprocessing resulting in the bam? Cross check your individual bams that were merged, do they result in the same PWM? cross check if this is coming from DNase experiment?
If it is none of this let me know - I can suggest an alternate version of the repo that will bypass this error. But be very sure that none of the above is happening.
(Image source: https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-019-1642-2/MediaObjects/13059_2019_1642_MOESM1_ESM.pdf)
Best,
Anu
Hi,
Ok, thanks for your help already! I will follow your suggestions and then come back to you.
Best,
Monika
Closing this due to inactivity, feel free to open this if you continue to see issues.