chrovis/cljam

Add tests to check generated data correctness

ayamada opened this issue · 6 comments

(Came from #60 (review) )

Add many tests by #60, but some tests doesn't check generated(converted) result data.
Should check these data (but may need correct result data).

  1. ;; TODO: examine result

  2. (are [?args] (not-throw? (with-out-file temp-out (cli/pileup ?args)))

    • Done by #65
    • But a comment say NB: "pileup" output format may change in future (maybe).
  3. ;; TODO: examine contents of temp-bam

  4. ;; TODO: check out-file

  5. (deftest about-create-dict

  6. ;; TODO: need bam-file include PCR duplications

  7. ;; TODO: Add test sorter/sort-by-pos with bam file includes many blocks

    • Need lightweight bam-file with many blocks (require to over 1500000) for sorter test coverage.
    • I commit ayamada@f2ed61b , it can work, but it cause very long time in lein cloverage, may not merge it.

I numbered items of this issue.

Seems (1)(2)(3)(4)(5) can be assigned to current cljam result for controlled test result.
(But it is very simple tests only comparing files, not perfect.)

Shall I create PR to compare contorolled test result for test (1)(2)(3)(4)(5) ?

Supplied tests for (1)(2)(4)(5)(6).

I have not good idea for (3).

I commit ayamada@f2ed61b for (7), work well, but it cause very long time in lein cloverage, may not merge it.

alumi commented

About (3)

For small BAM files, it might be easier to check levels manually with IGV, samtools tview or something.

For more complex BAM files, we need some programatic approach.

One option is to fork samtools.
samtools doesn't have a feature to output levels, but tview calculates them internally in lpileup.

https://github.com/samtools/htslib/blob/95b1034/htslib/sam.h#L527
https://github.com/samtools/samtools/blob/74a3e0a/bam_lpileup.c#L141

bam_pileup1_t has level and a pointer to bam1_t which has original BAM alignment information in bam1_core_t.

It may be possible to output SAM records or pairs of QNAME and level from that struct.

@alumi Thanks for your explanation.
It seems to be difficult for me to verify, tentatively, only check result of small BAM in ayamada@a134664 .
If this is OK, I will make PR later.

alumi commented

I checked ayamada/cljam@a134664 and the levels seem to be correct. Thanks!


just FYI

I wrote rough patch, and tested the bam file with commands like samtools tview -p ref:1-1000 -d T test.sorted.bam 2>/dev/null | sort | uniq >> test.sorted.sam.

I got following SAM and it's equal to levels in ayamada/cljam@a134664.

r001	163	ref	7	30	8M4I4M1D3M	=	37	39	TTAGATAAAGAGGATACTG	*	XX:B:S,12561,2,20,112	LV:i:0
r002	0	ref	9	30	1S2I6M1P1I1P1I4M2I	*	0	0	AAAAGATAAGGGATAAA	*	LV:i:1
r003	0	ref	9	30	5H6M	*	0	0	AGCTAA	*	LV:i:2
r004	0	ref	16	30	6M14N1I5M	*	0	0	ATAGCTCTCAGC	*	LV:i:2
r003	16	ref	29	30	6H5M	*	0	0	TAGGC	*	LV:i:0
r001	83	ref	37	30	9M	=	7	-39	CAGCGCCAT	*	LV:i:0
x1	0	ref2	1	30	20M	*	0	0	AGGTTTTATAAAACAAATAA	????????????????????	LV:i:0
x2	0	ref2	2	30	21M	*	0	0	GGTTTTATAAAACAAATAATT	?????????????????????	LV:i:1
x3	0	ref2	6	30	9M4I13M	*	0	0	TTATAAAACAAATAATTAAGTCTACA	??????????????????????????	LV:i:2
x4	0	ref2	10	30	25M	*	0	0	CAAATAATTAAGTCTACAGAGCAAC	?????????????????????????	LV:i:3
x5	0	ref2	12	30	24M	*	0	0	AATAATTAAGTCTACAGAGCAACT	????????????????????????	LV:i:4
x6	0	ref2	14	30	23M	*	0	0	TAATTAAGTCTACAGAGCAACTA	???????????????????????	LV:i:5
diff --git bam_tview.c bam_tview.c
index 206ac8b..7b16d93 100644
--- bam_tview.c
+++ bam_tview.c
@@ -62,7 +62,7 @@ int base_tv_init(tview_t* tv, const char *fn, const char *fn_fa,
 {
     assert(tv!=NULL);
     assert(fn!=NULL);
-    tv->mrow = 24; tv->mcol = 80;
+    tv->mrow = 8000; tv->mcol = 80;
     tv->color_for = TV_COLOR_MAPQ;
     tv->is_dot = 1;

@@ -113,6 +113,7 @@ void base_tv_destroy(tview_t* tv)
     sam_close(tv->fp);
 }

+samFile* gOUT;

 int tv_pl_func(uint32_t tid, uint32_t pos, int n, const bam_pileup1_t *pl, void *data)
 {
@@ -166,6 +167,10 @@ int tv_pl_func(uint32_t tid, uint32_t pos, int n, const bam_pileup1_t *pl, void
     for (j = 0; j <= max_ins; ++j) {
         for (i = 0; i < n; ++i) {
             const bam_pileup1_t *p = pl + i;
+            bam1_t* d = bam_dup1(p->b);
+            uint32_t lv = p->level - 1;
+            bam_aux_append(d, "LV", 'i', 4, (uint8_t*)&lv);
+            sam_write1(gOUT, tv->header, d);
             int row = TV_MIN_ALNROW + p->level - tv->row_shift;
             if (j == 0) {
                 if (!p->is_del) {
@@ -409,7 +414,7 @@ int bam_tview_main(int argc, char *argv[])
         error("cannot create view");
         return EXIT_FAILURE;
     }
-
+    gOUT = sam_open_format("-", "w", 0);
     if ( position )
     {
         int tid, beg, end;
@@ -417,7 +422,7 @@ int bam_tview_main(int argc, char *argv[])
         if (name_lim) *name_lim = '\0';
         else beg = 0; // region parsing failed, but possibly a seq named "foo:a"
         tid = bam_name2id(tv->header, position);
-        if (tid >= 0) { tv->curr_tid = tid; tv->left_pos = beg; }
+        if (tid >= 0) { tv->curr_tid = tid; tv->left_pos = beg; tv->mcol = end;}
     }
     else if ( tv->fai )
     {
@@ -437,6 +442,6 @@ int bam_tview_main(int argc, char *argv[])
     tv->my_drawaln(tv, tv->curr_tid, tv->left_pos);
     tv->my_loop(tv);
     tv->my_destroy(tv);
-
+    sam_close(gOUT);
     return EXIT_SUCCESS;
 }
diff --git bam_tview_html.c bam_tview_html.c
index e3aecda..a7927b5 100644
--- bam_tview_html.c
+++ bam_tview_html.c
@@ -327,7 +327,7 @@ tview_t* html_tv_init(const char *fn, const char *fn_fa, const char *samples,
         }
     tv->row_count=0;
     tv->screen=NULL;
-    tv->out=stdout;
+    tv->out=stderr;
     tv->attributes=0;
     base_tv_init(base,fn,fn_fa,samples,fmt);
     /* initialize callbacks */

@alumi I see, thank you!
And, I created #70 .