mikessh/migec

Better explanation of Assemble log file

mikessh opened this issue · 0 comments

Explain the READS* terms in the assemble.log.txt:

  • I'm not sure exactly where reads are being dropped. The forward and reverse reads for each MIG are separately assembled and reads with too many mismatches are dropped. Are the remaining reads READS_GOOD_FASTQ1 and READS_GOOD_FASTQ2?
  • How are READS_TOTAL and READS_DROPPED_WITHIN_MIG calculated?

I also noticed that READS_TOTAL is less than reads with the master sequence in checkout.log.txt - what filtering is occurring between checkout and assembly?

I think the MIG* statistics make more sense. Is the following correct? After MIG assembly, if the FASTQ1 MIG is of size greater than MIG_COUNT_THRESHOLD then it is counted in MIGS_GOOD_FASTQ1. Same for FASTQ2. And then the MIGS_GOOD_TOTAL is less than both MIGS_GOOD_FASTQ1 and MIGS_GOOD_FASTQ2 because a MIG is only kept if it is the specified read-size in both FASTQ1 and FASTQ2.