Ensembl/WiggleTools

Bins at ends of chromosomes

Closed this issue · 2 comments

gevro commented

I found a bit of unexpected behavior for bins at ends of chromosomes.

Boundaries of chromosome 1, ends at 248387497

# head -n 1 blah.genome.bed 
chr1	0	248387497

Regions of chromsoome 1, end at 248387497

# wiggletools write_bg - telomere.wig
chr1	0	3000	1.000000
chr1	3000	248384200	0.000000
chr1	248384200	248387497	1.000000

Bin 100: the last bin ends at 248387500, and the bin value is 0.97, which suggests it is doing a sum?

# wiggletools scale 0.01 bin 100 chm13.draft_v1.0.telomere.bw 
chr1	0	3000	1.000000
chr1	248384200	248387400	1.000000
chr1	248387400	248387500	0.970000

Trim to chromosome coordinates, still shows 0.97 as the value for the last bin in the chromosome, even when I put trim first.

# wiggletools trim blah.genome.bed scale 0.01 bin 100 chm13.draft_v1.0.telomere.bw
chr1	0	3000	1.000000
chr1	248384200	248387400	1.000000
chr1	248387400	248387497	0.970000

# wiggletools scale 0.01 bin 100 trim blah.genome.bed chm13.draft_v1.0.telomere.bw
chr1	0	3000	1.000000
chr1	248384200	248387400	1.000000
chr1	248387400	248387500	0.970000

I would expect the last bin in the chromosome to have a value equal to the average of the bin values, after trimming. Therefore the last bin in the chromosome should be 1.0.

Am I thinking about this incorrectly?

Thanks.

Hello @gevro,

Regarding:

wiggletools scale 0.01 bin 100 chm13.draft_v1.0.telomere.bw

If you consider the operation from the last bin, *400 to *500, what you see is an overlap with a region of coverage 1 which starts at *400 and ends at *497. As you surmised, the sum/integral of this overlap gives you 97, which you scaled down to 100.

Regarding:

wiggletools trim blah.genome.bed scale 0.01 bin 100 chm13.draft_v1.0.telomere.bw

Same thing, but you then trimmed the boundary of the bin, which changes the boundaries, but not value of the bin. What you do obtain however is a region of length 97bp with coverage 0.97, so its density is 1, same as the others. I guess there could be an ambiguity here as to whether the value should be treated as a sum or a density.

Regarding:

wiggletools scale 0.01 bin 100 trim blah.genome.bed chm13.draft_v1.0.telomere.bw

The trimming is now upstream of the binning, and simply trim the signal from your file. Because your telomeric regions are already trimmed to the right size, the trimming has no effect, so this function is de facto equivalent to the first.

Hope this helps,

Daniel

gevro commented

Ok thank you!