kundajelab/chrombpnet

Question on pred_bw

ZixiaoWang17 opened this issue · 7 comments

Thanks for creating this amazing package!

I have a little question on the "-r" argument in pred_bw, the bed file input for prediction, I was wondering which file should I use suppose I want to predict for a certain region.

Say Chr1: pos1- pos2, should I merge this region with the peak file after excluding blacklist, "_negative.bed", or can I make a bed file only including this region, e.g., V1=chr1,V2=pos1,V3=pos2, V4-V10 = NA?

Thank you so much for your time and consideration!

Best,
Zixiao

Hello,

So if you are interested in getting predictions for a regions say - Chr1: pos1- pos2. You can just make your own bed file. Since ChromBPNet can make predictions for 1000bp regions, you will need to break this pos1-pos2 regions to 1000bp windows and then input the windows start/end coordinates as a bed file. You can use a stride of 1000bp or less when making these windows and the code will be able to handle it. I think bedtools provides a command to do this already.

Since the -r is in narrow peak format - this is what your bed file (tab-separated) will look like

(1) chr
(2) start of the 1000 bp window
(3) end of the 1000 bp window
(4), (5), (6), (7), (8), (9) will be empty character '.'
(10) will be (end-start)/2 or 

Hope this answers your question! Feel free to reply back if there are any more.

Gotcha, I do appreciate the help!!

When I try to use my region to predict the bigwig, i got this error:
chr start end 1 2 3 4 5 6 summit
0 chr7 90117390 90121390 NaN NaN NaN NaN NaN NaN 2000
Traceback (most recent call last):
File "/dcl02/hongkai/data/mjiang/software/env/chrombpnet/bin/chrombpnet", line 33, in
sys.exit(load_entry_point('chrombpnet', 'console_scripts', 'chrombpnet')())
File "/dcl02/hongkai/data/mjiang/software/chrombpnet/chrombpnet/CHROMBPNET.py", line 56, in main
predict_to_bigwig.main(args)
File "/dcl02/hongkai/data/mjiang/software/chrombpnet/chrombpnet/evaluation/make_bigwigs/predict_to_bigwig.py", line 136, in main
seqs, regions_used = bigwig_helper.get_seq(regions_df, g, inputlen)
File "/dcl02/hongkai/data/mjiang/software/chrombpnet/chrombpnet/evaluation/make_bigwigs/bigwig_helper.py", line 26, in get_seq
return one_hot.dna_to_one_hot(vals), np.array(peaks_used)
File "/dcl02/hongkai/data/mjiang/software/chrombpnet/chrombpnet/training/utils/one_hot.py", line 18, in dna_to_one_hot
seq_len = len(seqs[0])
IndexError: list index out of range

Would you please take a look at it? Thanks!

I am currently trying to use chrombpnet and have some questions on the pred_bw function.

Suppose I have a trained model in cell type M, how to distinguish (or say interpret?) the result when specifying the region (from A start to An end) to be:

  1. the peak file in this region from cell type M
  2. the peak file in this region from cell type N
  3. just the start and end region

Or say, is there going to be any difference if i am only interested in peak region?

Thank you so much for building this pipeline and your time! I do appreciate it.

Sorry I missed your earlier message - it seems like your bed file as an additional tab and hence is causing that error.

Its showing -
0 chr7 90117390 90121390 NaN NaN NaN NaN NaN NaN 2000

but should have been -
chr7 90117390 90121390 NaN NaN NaN NaN NaN NaN 2000

I am sorry I don't fully understand your question. Are you asking if there will be a difference in the output if you provided the three different bed files you mentioned?

I am sorry I don't fully understand your question. Are you asking if there will be a difference in the output if you provided the three different bed files you mentioned?

yea i mean this, is there a difference