Add tests to check generated data correctness
ayamada opened this issue · 6 comments
(Came from #60 (review) )
Add many tests by #60, but some tests doesn't check generated(converted) result data.
Should check these data (but may need correct result data).
-
Line 53 in ff91c66
- Done by #66
-
Line 96 in ff91c66
- Done by #65
- But a comment say
NB: "pileup" output format may change in future (maybe)
.
-
Line 142 in ff91c66
- Done by #70
-
Line 218 in ff91c66
- Done by #67
-
Line 17 in ff91c66
- Done by #64
-
Line 11 in ff91c66
- Done by #63
-
Line 110 in ff91c66
- Need lightweight bam-file with many blocks (require to over 1500000) for sorter test coverage.
- I commit ayamada@f2ed61b , it can work, but it cause very long time in
lein cloverage
, may not merge it.
I numbered items of this issue.
Seems (1)(2)(3)(4)(5) can be assigned to current cljam
result for controlled test result.
(But it is very simple tests only comparing files, not perfect.)
Shall I create PR to compare contorolled test result for test (1)(2)(3)(4)(5) ?
Supplied tests for (1)(2)(4)(5)(6).
I have not good idea for (3).
I commit ayamada@f2ed61b for (7), work well, but it cause very long time in lein cloverage
, may not merge it.
About (3)
For small BAM files, it might be easier to check levels manually with IGV, samtools tview
or something.
For more complex BAM files, we need some programatic approach.
One option is to fork samtools.
samtools doesn't have a feature to output levels, but tview
calculates them internally in lpileup
.
https://github.com/samtools/htslib/blob/95b1034/htslib/sam.h#L527
https://github.com/samtools/samtools/blob/74a3e0a/bam_lpileup.c#L141
bam_pileup1_t
has level
and a pointer to bam1_t
which has original BAM alignment information in bam1_core_t
.
It may be possible to output SAM records or pairs of QNAME and level from that struct.
@alumi Thanks for your explanation.
It seems to be difficult for me to verify, tentatively, only check result of small BAM in ayamada@a134664 .
If this is OK, I will make PR later.
I checked ayamada/cljam@a134664 and the levels seem to be correct. Thanks!
just FYI
I wrote rough patch, and tested the bam file with commands like samtools tview -p ref:1-1000 -d T test.sorted.bam 2>/dev/null | sort | uniq >> test.sorted.sam
.
I got following SAM and it's equal to levels in ayamada/cljam@a134664.
r001 163 ref 7 30 8M4I4M1D3M = 37 39 TTAGATAAAGAGGATACTG * XX:B:S,12561,2,20,112 LV:i:0
r002 0 ref 9 30 1S2I6M1P1I1P1I4M2I * 0 0 AAAAGATAAGGGATAAA * LV:i:1
r003 0 ref 9 30 5H6M * 0 0 AGCTAA * LV:i:2
r004 0 ref 16 30 6M14N1I5M * 0 0 ATAGCTCTCAGC * LV:i:2
r003 16 ref 29 30 6H5M * 0 0 TAGGC * LV:i:0
r001 83 ref 37 30 9M = 7 -39 CAGCGCCAT * LV:i:0
x1 0 ref2 1 30 20M * 0 0 AGGTTTTATAAAACAAATAA ???????????????????? LV:i:0
x2 0 ref2 2 30 21M * 0 0 GGTTTTATAAAACAAATAATT ????????????????????? LV:i:1
x3 0 ref2 6 30 9M4I13M * 0 0 TTATAAAACAAATAATTAAGTCTACA ?????????????????????????? LV:i:2
x4 0 ref2 10 30 25M * 0 0 CAAATAATTAAGTCTACAGAGCAAC ????????????????????????? LV:i:3
x5 0 ref2 12 30 24M * 0 0 AATAATTAAGTCTACAGAGCAACT ???????????????????????? LV:i:4
x6 0 ref2 14 30 23M * 0 0 TAATTAAGTCTACAGAGCAACTA ??????????????????????? LV:i:5
diff --git bam_tview.c bam_tview.c
index 206ac8b..7b16d93 100644
--- bam_tview.c
+++ bam_tview.c
@@ -62,7 +62,7 @@ int base_tv_init(tview_t* tv, const char *fn, const char *fn_fa,
{
assert(tv!=NULL);
assert(fn!=NULL);
- tv->mrow = 24; tv->mcol = 80;
+ tv->mrow = 8000; tv->mcol = 80;
tv->color_for = TV_COLOR_MAPQ;
tv->is_dot = 1;
@@ -113,6 +113,7 @@ void base_tv_destroy(tview_t* tv)
sam_close(tv->fp);
}
+samFile* gOUT;
int tv_pl_func(uint32_t tid, uint32_t pos, int n, const bam_pileup1_t *pl, void *data)
{
@@ -166,6 +167,10 @@ int tv_pl_func(uint32_t tid, uint32_t pos, int n, const bam_pileup1_t *pl, void
for (j = 0; j <= max_ins; ++j) {
for (i = 0; i < n; ++i) {
const bam_pileup1_t *p = pl + i;
+ bam1_t* d = bam_dup1(p->b);
+ uint32_t lv = p->level - 1;
+ bam_aux_append(d, "LV", 'i', 4, (uint8_t*)&lv);
+ sam_write1(gOUT, tv->header, d);
int row = TV_MIN_ALNROW + p->level - tv->row_shift;
if (j == 0) {
if (!p->is_del) {
@@ -409,7 +414,7 @@ int bam_tview_main(int argc, char *argv[])
error("cannot create view");
return EXIT_FAILURE;
}
-
+ gOUT = sam_open_format("-", "w", 0);
if ( position )
{
int tid, beg, end;
@@ -417,7 +422,7 @@ int bam_tview_main(int argc, char *argv[])
if (name_lim) *name_lim = '\0';
else beg = 0; // region parsing failed, but possibly a seq named "foo:a"
tid = bam_name2id(tv->header, position);
- if (tid >= 0) { tv->curr_tid = tid; tv->left_pos = beg; }
+ if (tid >= 0) { tv->curr_tid = tid; tv->left_pos = beg; tv->mcol = end;}
}
else if ( tv->fai )
{
@@ -437,6 +442,6 @@ int bam_tview_main(int argc, char *argv[])
tv->my_drawaln(tv, tv->curr_tid, tv->left_pos);
tv->my_loop(tv);
tv->my_destroy(tv);
-
+ sam_close(gOUT);
return EXIT_SUCCESS;
}
diff --git bam_tview_html.c bam_tview_html.c
index e3aecda..a7927b5 100644
--- bam_tview_html.c
+++ bam_tview_html.c
@@ -327,7 +327,7 @@ tview_t* html_tv_init(const char *fn, const char *fn_fa, const char *samples,
}
tv->row_count=0;
tv->screen=NULL;
- tv->out=stdout;
+ tv->out=stderr;
tv->attributes=0;
base_tv_init(base,fn,fn_fa,samples,fmt);
/* initialize callbacks */