Ensembl/WiggleTools

seek output duplicate rows

Closed this issue · 4 comments

gevro commented

Hi,
I'm seeing a weird bug.

When I add seek to this:

wiggletools write_bg - compress gte 0.2 test.bw

it looks like this:

wiggletools seek chr2 1 1000000 write_bg - compress gte 0.2 test.bw

The ouptut for some reason then duplicates some rows:

> less -N chr2.seekfirst.bg
      1 chr2    160     220     1.000000
      2 chr2    3460    3940    1.000000
      3 chr2    4020    61280   1.000000
      4 chr2    61440   119720  1.000000
      5 chr2    119820  120120  1.000000
      6 chr2    120140  120160  1.000000
      7 chr2    120220  120380  1.000000
      8 chr2    120400  120780  1.000000
      9 chr2    120960  151340  1.000000
     10 chr2    151380  151600  1.000000
     11 chr2    151620  151780  1.000000
...
239 chr2    2822040 2822440 1.000000
    240 chr2    2822460 2822500 1.000000
    241 chr2    2822580 2876040 1.000000
    242 chr2    160     220     1.000000
    243 chr2    3460    3940    1.000000
    244 chr2    4020    61280   1.000000
    245 chr2    61440   119720  1.000000
    246 chr2    119820  120120  1.000000
    247 chr2    120140  120160  1.000000
    248 chr2    120220  120380  1.000000
    249 chr2    120400  120780  1.000000
    250 chr2    120960  151340  1.000000
    251 chr2    151380  151600  1.000000
    252 chr2    151620  151780  1.000000

You can see it goes back to the chr2 160 220 position, which is the first data set.

So in other words, I'm getting sequential back-to-back copies of the output.

gevro commented

Also, if I do trim first (before write_bg), I'm getting a different result than if I do trim last:

> head chr2.trimfirst.bg
chr1	680	940	1.000000
chr1	3060	3220	1.000000
chr1	3260	3280	1.000000
chr1	3420	9580	1.000000
chr1	9700	10280	1.000000
chr1	10340	14540	1.000000
chr1	14640	24200	1.000000
chr1	24260	24280	1.000000
chr1	26460	27500	1.000000

> head chr2.trimaftergte.bg
chr2	160	220	1.000000
chr2	3460	3940	1.000000
chr2	4020	61280	1.000000
chr2	61440	119720	1.000000
chr2	119820	120120	1.000000
chr2	120140	120160	1.000000
chr2	120220	120380	1.000000
chr2	120400	120780	1.000000

It actually looks like when I do trim first, it isn't doing anything because it is outputting all genomic locations.

This is because write_bg - prints out to standard output regardless of its position in the command, but the left most command also prints out to stdout. If needed, you can squash the latter by putting do at the very beginning of the command.

gevro commented

Ok, so just to confirm - the correct way to do this is to put trim and seek after 'write_bg -' ?
wiggletools write_bg - seek chr2 1 1000000 compress gte 0.2 test.bw
wiggletools write_bg - trim trim.bed compress gte 0.2 test.bw

Yes, that's correct.