kundajelab/chrombpnet

Input file shifts inconsistent

monikaheinzl opened this issue · 5 comments

Hi,

I have a similar issue as in #153 and #169, while training the bias model for my Drosophila DNase-seq data. I have mapped my raw fastq files with Bowtie and MACS2, and I didn’t do any shifting to the data. There is also no inconsistency between the genomic versions of the BAM and BED files (see a screenshot later). Still, I get the following error:

Traceback (most recent call last):
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 179, in <module>
    main()
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 38, in main
    pipelines.train_bias_pipeline(args)
  File “/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/pipelines.py", line 278, in train_bias_pipeline
    reads_to_bigwig.main(args)
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/reads_to_bigwig.py", line 96, in main
    plus_shift, minus_shift = auto_shift_detect.compute_shift(args.input_bam_file,
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py", line 234, in compute_shift
    plus_shift, minus_shift = compute_shift_DNASE(ref_plus_pwms, ref_minus_pwms, plus_pwm, minus_pwm)
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py", line 211, in compute_shift_DNASE
    raise ValueError("Input file shifts inconsistent. Please post an Issue")
ValueError: Input file shifts inconsistent. Please post an Issue

Here is also my command for training the bias model:

chrombpnet bias pipeline \
        -ibam input.bam \
        -d "DNASE" \
        -g $genome \
        -c $chrom.sizes \
        -p $peak_file \
        -n $negatives_file \
        -fl fold_0.json \
        -b 0.8 \
        -o $outfolder/ \
        -fp bias_model

As suggested in issue #153, I have also generated the PWM for my BAM file:
image

Here is also an example of my peak file:

chr2L	5183	6872	peak_1	516	.	5.13902	51.60207	48.89282	662
chr2L	18506	19124	peak_2	147	.	2.66881	14.75620	12.99561	447
chr2L	19642	20053	peak_3	64	.	1.98697	6.48809	5.02063	247
chr2L	21603	21939	peak_4	79	.	2.12766	7.90964	6.38455	191
chr2L	34039	34352	peak_5	123	.	2.59835	12.32056	10.64090	152
chr2L	35402	35815	peak_6	111	.	2.51643	11.10707	9.46780	222
chr2L	41896	42336	peak_7	51	.	1.86441	5.16313	3.75423	229
chr2L	45892	47619	peak_8	218	.	3.82924	21.89166	19.91631	1413
chr2L	52497	54348	peak_9	145	.	3.19574	14.50274	12.75090	543

Many thanks for your help,
Monika

Hello Monika,

Can you post the commands you used for generating the png?

I adapted them from issue #153. But here they are:

samtools view -b input.bam chr2L > out.bam

samtools view -b  -F796  -@50 out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chrom.sizes | bedtools sort -i stdin > tmp2

bedGraphToBigWig tmp2 $chrom.sizes unstranded.bw

python build_pwm_from_bigwig.py -i unstranded.bw -g $genome -o DHS_no_shift -cr "chr2L" -c $chrom.sizes

Hello @monikaheinzl,

DNase I cleavage logo is known to be pretty variable -
image

I was expecting to see something closer to any of these representations. But the PWM your showing is very different. So can you check for the following (1) if there is some problem with the build or preprocessing resulting in the bam? Cross check your individual bams that were merged, do they result in the same PWM? cross check if this is coming from DNase experiment?

If it is none of this let me know - I can suggest an alternate version of the repo that will bypass this error. But be very sure that none of the above is happening.

(Image source: https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-019-1642-2/MediaObjects/13059_2019_1642_MOESM1_ESM.pdf)

Best,
Anu

Hi,

Ok, thanks for your help already! I will follow your suggestions and then come back to you.

Best,
Monika

Closing this due to inactivity, feel free to open this if you continue to see issues.