seek output duplicate rows
Closed this issue · 4 comments
Hi,
I'm seeing a weird bug.
When I add seek to this:
wiggletools write_bg - compress gte 0.2 test.bw
it looks like this:
wiggletools seek chr2 1 1000000 write_bg - compress gte 0.2 test.bw
The ouptut for some reason then duplicates some rows:
> less -N chr2.seekfirst.bg
1 chr2 160 220 1.000000
2 chr2 3460 3940 1.000000
3 chr2 4020 61280 1.000000
4 chr2 61440 119720 1.000000
5 chr2 119820 120120 1.000000
6 chr2 120140 120160 1.000000
7 chr2 120220 120380 1.000000
8 chr2 120400 120780 1.000000
9 chr2 120960 151340 1.000000
10 chr2 151380 151600 1.000000
11 chr2 151620 151780 1.000000
...
239 chr2 2822040 2822440 1.000000
240 chr2 2822460 2822500 1.000000
241 chr2 2822580 2876040 1.000000
242 chr2 160 220 1.000000
243 chr2 3460 3940 1.000000
244 chr2 4020 61280 1.000000
245 chr2 61440 119720 1.000000
246 chr2 119820 120120 1.000000
247 chr2 120140 120160 1.000000
248 chr2 120220 120380 1.000000
249 chr2 120400 120780 1.000000
250 chr2 120960 151340 1.000000
251 chr2 151380 151600 1.000000
252 chr2 151620 151780 1.000000
You can see it goes back to the chr2 160 220 position, which is the first data set.
So in other words, I'm getting sequential back-to-back copies of the output.
Also, if I do trim first (before write_bg), I'm getting a different result than if I do trim last:
> head chr2.trimfirst.bg
chr1 680 940 1.000000
chr1 3060 3220 1.000000
chr1 3260 3280 1.000000
chr1 3420 9580 1.000000
chr1 9700 10280 1.000000
chr1 10340 14540 1.000000
chr1 14640 24200 1.000000
chr1 24260 24280 1.000000
chr1 26460 27500 1.000000
> head chr2.trimaftergte.bg
chr2 160 220 1.000000
chr2 3460 3940 1.000000
chr2 4020 61280 1.000000
chr2 61440 119720 1.000000
chr2 119820 120120 1.000000
chr2 120140 120160 1.000000
chr2 120220 120380 1.000000
chr2 120400 120780 1.000000
It actually looks like when I do trim first, it isn't doing anything because it is outputting all genomic locations.
This is because write_bg -
prints out to standard output regardless of its position in the command, but the left most command also prints out to stdout. If needed, you can squash the latter by putting do
at the very beginning of the command.
Ok, so just to confirm - the correct way to do this is to put trim and seek after 'write_bg -' ?
wiggletools write_bg - seek chr2 1 1000000 compress gte 0.2 test.bw
wiggletools write_bg - trim trim.bed compress gte 0.2 test.bw
Yes, that's correct.