kundajelab/chrombpnet

Error while shifting ATAC peaks

Closed this issue · 17 comments

Hello all,

Thank you so much for your work on ChromBPNet!

I was wondering if you could help me to understand an error I am getting while running the pipeline.

This error happens during the ATAC peaks shifting. By the way, I am not sure of understanding much why 2114 bp...

Traceback (most recent call last):
  File "/project/tsslab/arodrig/miniconda/envs/chrombpnet/bin/chrombpnet", line 8, in <module>
    sys.exit(main())
  File "/project/tsslab/arodrig/miniconda/envs/chrombpnet/lib/python3.10/site-packages/chrombpnet/CHROMBPNET.py", line 23, in main
    pipelines.chrombpnet_train_pipeline(args)
  File "/project/tsslab/arodrig/miniconda/envs/chrombpnet/lib/python3.10/site-packages/chrombpnet/pipelines.py", line 21, in chrombpnet_train_pipeline
    reads_to_bigwig.main(args)
  File "/project/tsslab/arodrig/miniconda/envs/chrombpnet/lib/python3.10/site-packages/chrombpnet/helpers/preprocessing/reads_to_bigwig.py", line 70, in main
    plus_shift, minus_shift = auto_shift_detect.compute_shift(args.input_bam_file,
  File "/project/tsslab/arodrig/miniconda/envs/chrombpnet/lib/python3.10/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py", line 238, in compute_shift
    plus_shift, minus_shift = compute_shift_ATAC(ref_plus_pwms, ref_minus_pwms, plus_pwm, minus_pwm)
  File "/project/tsslab/arodrig/miniconda/envs/chrombpnet/lib/python3.10/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py", line 194, in compute_shift_ATAC
    raise ValueError("Input file shifts inconsistent. Please post an issue")
ValueError: Input file shifts inconsistent. Please post an issue

I would very much appreciate it if you could help me understand this error a bit more.
Thank you!

Hi Andrea, you don’t need to do any additional filtering by read length. The 2114 bp is the size of the model input sequence. Can you describe how you are performing read alignment and other preprocessing of the reads? Specifically how the reads are being shifted on the + and - strands?

Hi Surag!

Thanks for the reply.

I realized I was doing something wrong. Now I am only following the instructions on the tutorials, which means:

bedtools slop -i $blacklist -g $chrom_sizes -b 1057 > temp.bed
bedtools intersect -v -a $peaks -b temp.bed  > peaks_no_blacklist.bed
chrombpnet prep nonpeaks -g $genome -p peaks_no_blacklist.bed -c  $chrom_sizes -fl $splits -br $blacklist -o output
chrombpnet bias pipeline \
        -ibam $bam_neg \
        -d "ATAC" \
        -g $genome \
        -c $chrom_sizes \
        -p peaks_no_blacklist.bed \
        -n output_negatives.bed \
        -fl fold_0.json \
        -b 0.5 \
        -o bias_model/ \
        -fp k562 \

But I am still getting the same error.

Hi Andrea,

How did you get your bam files from fastq's? Are you using raw bam files or did you do any additional preprocessing on them?

Hi Anusri,

I did not generate the data myself and I did not process it anyhow. I was given the BAM files just as they are. I could send you a subsample of it so maybe its easier to detect if it has any issues?

What type of pre-processing could cause this error?

Thanks.

Can you run the following commands on your bam and share the atac_no_shift.png?

samtools view -b $atac_in_bam chr20 > out.bam

samtools view -b -@50 out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chrom_sizes | bedtools sort -i stdin > tmp2

bedGraphToBigWig tmp2 $chrom_sizes unstranded.bw

python build_pwm_from_bigwig.py -i unstranded.bw -g $genome -o atac_no_shift.png -c "chr20" -cz $chrom_sizes

this is the build_pwm_from_bigwig.py - https://github.com/kundajelab/chrombpnet/blob/master/chrombpnet/helpers/preprocessing/analysis/build_pwm_from_bigwig.py

Your bam files might have been pre-shifted to a non-standard shift

@panushri25 the error is from here:

if len(plus_shifts) != 1 or len(minus_shifts) != 1:
raise ValueError("Input file shifts inconsistent. Please post an issue")

The issue is not non-standard shift (it would have printed those values otherwise). Instead, different Tn5 motifs we use as reference are suggesting different shifts. This could suggest more severe QC issues or something worse, so building PWM would make sense.

Going ahead we should run build PWM automatically if the above fails, so that people can just post the png directly.

Can you run the following commands on your bam and share the atac_no_shift.png?

samtools view -b $atac_in_bam chr20 > out.bam

samtools view -b -@50 out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chrom_sizes | bedtools sort -i stdin > tmp2

bedGraphToBigWig tmp2 $chrom_sizes unstranded.bw

python build_pwm_from_bigwig.py -i unstranded.bw -g $genome -o atac_no_shift.png -c "chr20" -cz $chrom_sizes

this is the build_pwm_from_bigwig.py - https://github.com/kundajelab/chrombpnet/blob/master/chrombpnet/helpers/preprocessing/analysis/build_pwm_from_bigwig.py

This is the output atac_no_shift.png for my positive label
atac_no_shift

And this is the output png for my negative label
atac_no_shift png

Sorry what do you mean by positive label and negative label?

So I have two BAM files, one is the positive label and the other is the negative. Both of them are different experiments but with the same background. So I am trying to build a bias model from the negative label (background) to correct the bias in the positive label.

Sorry can you try this new command set -

samtools view -b $atac_in_bam chr20 > out.bam

samtools view -b  -F796  -@50 out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chrom_sizes | bedtools sort -i stdin > tmp2

bedGraphToBigWig tmp2 $chrom_sizes unstranded.bw

python build_pwm_from_bigwig.py -i unstranded.bw -g $genome -o atac_no_shift.png -c "chr20" -cz $chrom_sizes

I still dont fully understand what the "positive" and "negative" means, can you provide a bit more context

Sorry for the confusion. It is just that I have to train two models for each of my conditions.

Here are the atac_no_shift.png

Positive class
atac_no_shift png

Negative class
atac_no_shift

Is the positive class dataset a ATAC-seq dataset, it doesnt look like it

It is ATAC-seq

@panushri25 the error is from here:

if len(plus_shifts) != 1 or len(minus_shifts) != 1:
raise ValueError("Input file shifts inconsistent. Please post an issue")

The issue is not non-standard shift (it would have printed those values otherwise). Instead, different Tn5 motifs we use as reference are suggesting different shifts. This could suggest more severe QC issues or something worse, so building PWM would make sense.

Going ahead we should run build PWM automatically if the above fails, so that people can just post the png directly.

You mean the position weight matrix? How should I go ahead with this?

I think I found the issue...

First, one of the models was using a BAM file and a BED file where both were aligned to different genome versions. And second, my peaks didn't include the summits...

From here, what is the summit? Is it (end - start) / 2 or start + (end - start) / 2?

Thanks again for all the support.