Ensembl/WiggleTools

Bug in 'bin'

gevro opened this issue · 10 comments

gevro commented

Hi,
I found a bug in bin. I have a bigwig whose max values are 1.0:

$wiggletools write_bg - fillIn genome.bed blah.bw | head -n 1000000 | cut -f 4 | sort -nr | head
1.000000
1.000000
1.000000
1.000000

However, when I do bin 20 scale 0.05, I am getting values > 1 (either with or without write_bg), which should be impossible:

$wiggletools bin 20 scale 0.05 fillIn genome.bed blah.bw
chr1	109840	109860	0.982538
chr1	109860	110020	1.000000
chr1	110020	110040	1.321969
chr1	110040	110060	0.959469
chr1	110060	110080	0.961057
$wiggletools write_bg - bin 20 scale 0.05 fillIn genome.bed 
chr1	109980	110000	1.000000
chr1	110000	110020	1.000000
chr1	110020	110040	1.321969
chr1	110040	110060	0.959469
chr1	110060	110080	0.961057

It is happening due to bin, because without scale, it is still happening (max value should be 20):

$ wiggletools bin 20 fillIn genome.bed blah.bw
chr1	109820	109840	19.169865
chr1	109840	109860	19.650755
chr1	109860	110020	20.000000
chr1	110020	110040	26.439389
chr1	110040	110060	19.189385
chr1	110060	110080	19.221145

This is the original area of that issue without binning:

$wiggletools fillIn genome.bed blah.bw
chr1    109845  109847  0.954545
chr1    109847  109848  0.950000
chr1    109848  110027  1.000000
chr1    110027  109848  0.000000
chr1    109848  110027  1.000000
chr1    110027  110032  0.954545
chr1    110032  110041  0.958333

I'm not sure the source of the bug, but I suspect that binning is not weighting properly the proportion of the bin that is overlapped by the bigwig values.

Hello @gevro ,

Actually, there appears to be a bug in fillIn, judging by the repeated sequences (chr1:109848-110027), which would allow for double counting. I'll have a look.

Cheers,

Daniel

gevro commented

Ah yes, good catch!

Hello @gevro ,

I'm struggling to reproduce the bug, could you please share the genome.bed and blah.bw files with me? Or at least the relevant region on chr1.

Cheers,

Daniel

gevro commented

Actually, the bug might be upstream. See below:

$wiggletools blah.bw
fixedStep chrom=chr1 start=109810 step=1
0.925000
0.966666
0.964285
0.964285
0.964285
chr1    109814  109821  0.961538
chr1    109821  109845  0.958333
chr1    109845  109847  0.954545
fixedStep chrom=chr1 start=109848 step=1
0.950000
chr1    109848  110027  1.000000
chr1    109848  110027  1.000000
chr1    110027  110032  0.954545
chr1    110032  110041  0.958333
chr1    110041  110046  0.954545

This is showing up twice:
chr1 109848 110027 1.000000
chr1 109848 110027 1.000000

blah.bw was made from blah.wig using kent tools. Here is the area from blah.wig:

$cat blah.wig
fixedStep chrom=chr1 start=109810 step=1
0.925000
0.966666
0.964285
0.964285
0.964285
chr1    109814  109821  0.961538
chr1    109821  109845  0.958333
chr1    109845  109847  0.954545
fixedStep chrom=chr1 start=109848 step=1
0.950000
chr1    109848  110027  1.000000
chr1    110027  110032  0.954545
chr1    110032  110041  0.958333
chr1    110041  110046  0.954545

Origin of the above blah.wig file is from:
$wiggletools scale 0.5 sum blah1.bw blah2.bw > blah.wig

Without scale 0.5 it shows:

$wiggletools sum blah1.bw blah2.bw
fixedStep chrom=chr1 start=109810 step=1
1.850000
1.933333
1.928571
1.928571
1.928571
chr1    109814  109821  1.923077
chr1    109821  109845  1.916667
chr1    109845  109847  1.909091
fixedStep chrom=chr1 start=109848 step=1
1.900000
chr1    109848  110027  2.000000
chr1    110027  110032  1.909091
chr1    110032  110041  1.916667
chr1    110041  110046  1.909091

Here are the regions from the two original bw files before sum:

$blah1.bw
fixedStep chrom=chr1 start=109803 step=1
0.909091
chr1    109803  109810  0.916667
chr1    109810  110140  1.000000
chr1    110140  110146  0.875000
chr1    110146  110171  0.857143
chr1    110171  110175  0.833333
chr1    110175  110179  0.800000
chr1    110179  110197  0.833333
chr1    110197  110208  0.800000
chr1    110208  110226  0.750000
chr1    110226  110228  0.500000
$blah2.bw
fixedStep chrom=chr1 start=109794 step=1
0.866667
0.857143
0.857143
0.857143
0.857143
0.923077
0.923077
0.923077
0.916667
chr1    109802  109809  0.928571
chr1    109809  109811  0.933333
chr1    109811  109814  0.928571
chr1    109814  109821  0.923077
chr1    109821  109845  0.916667
chr1    109845  109847  0.909091
fixedStep chrom=chr1 start=109848 step=1
0.900000
chr1    109848  110027  1.000000
chr1    110027  110032  0.909091
chr1    110032  110041  0.916667
chr1    110041  110046  0.909091

blah1.bw and blah2.bw were made from bedgraph using kent tools:

$blah1.bedgraph
chr1    109845  109846  1
chr1    109846  109847  1
chr1    109847  109848  1
chr1    109848  109849  1
chr1    109849  109850  1
chr1    109850  109851  1
chr1    109851  109852  1
chr1    109852  109853  1
chr1    109853  109854  1
$blah2.bedgraph
chr1    109845  109846  0.909091
chr1    109846  109847  0.909091
chr1    109847  109848  0.9
chr1    109848  109849  1
chr1    109849  109850  1
chr1    109850  109851  1
chr1    109851  109852  1
chr1    109852  109853  1

Does this help find the bug? Is the issue with 'sum'?

gevro commented

Sure! Thanks, I will e-mail you now.

Hello @gevro ,

As mentioned by email, I think I found and corrected the bug, please pull in my latest commits on master.

Let me know how it goes.

Thanks for spotting this,

Daniel

gevro commented

Installation via brew is not working:

  1. First I uninstalled: brew uninstall wiggletools.

2)Then install from --HEAD

$ brew install wiggletools --HEAD
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/core and homebrew/cask).
==> New Formulae
pkger
==> Updated Formulae
Updated 104 formulae.
==> New Casks
operator                                                                      silicon
==> Updated Casks
Updated 49 casks.

==> Installing wiggletools from brewsci/bio
==> Cloning https://github.com/Ensembl/WiggleTools.git
Updating /Users/evrong01/Library/Caches/Homebrew/wiggletools--git
From https://github.com/Ensembl/WiggleTools
   5b4f2a4..94e11c4  master     -> origin/master
==> Checking out branch master
Already on 'master'
Your branch is behind 'origin/master' by 9 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
HEAD is now at 94e11c4 Implemented compression keyword
==> make
Last 15 lines from /Users/evrong01/Library/Logs/Homebrew/wiggletools/01.make:
                apply = newWiggleIterator(data, &LooseBufferedWiggleIteratorPop, &BufferedWiggleIteratorSeek, data->default_value);
                        ~~~~~~~~~~~~~~~~~                                                                                        ^
./wiggleIterator.h:37:1: note: 'newWiggleIterator' declared here
WiggleIterator * newWiggleIterator(void * data, void (*pop)(WiggleIterator *), void (*seek)(WiggleIterator *, const char *, int, int), double default_value, bool overlapping);
^
apply.c:169:77: error: too few arguments to function call, expected 5, have 4
        return newWiggleIterator(data, &FillInUnaryPop, NULL, source->default_value);
               ~~~~~~~~~~~~~~~~~                                                   ^
./wiggleIterator.h:37:1: note: 'newWiggleIterator' declared here
WiggleIterator * newWiggleIterator(void * data, void (*pop)(WiggleIterator *), void (*seek)(WiggleIterator *, const char *, int, int), double default_value, bool overlapping);
^
3 errors generated.
make[1]: *** [apply.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Wiggletools] Error 2

If reporting this issue please do so at (not Homebrew/brew or Homebrew/core):
  https://github.com/brewsci/homebrew-bio/issues

Please create pull requests instead of asking for help on Homebrew's GitHub,
Twitter or any other official channels.

D'oh! I had forgotten to commit the changes in that file!

Now fixed, please try again.

gevro commented

Works! I am truly appreciative for all the quick fixes and improvements!