barcodes not in the barcode multiplicity file?
Closed this issue · 3 comments
Hi,
I'm running into a warning with arks saying WARNING:: Your chromium read file has 13071618 read pairs that have barcodes not in the barcode multiplicity file.Cumulative memory usage: 4621292
, but my understanding was that the barcode multiplicity file was generated from the read file itself. I'm probably not understanding something in arks, because this warning is a bit cryptic to me.
I'm also seeing that a large chunk of reads are being skipped (discarded?) by arks because apparently they don't have a good contig (Skipped reads pairs without a good contig: 162242712
). Is this expected by arks? and would it make sense to tune the parameters to include more reads in the analysis?
I'm running arks with default parameters except specifying a minimum contig length of 1kb. The full command is:
arks-make arks time=1 draft=$draft reads=$reads threads=8 z=1000 k=30
Thanks,
Pedro
Hi Pedro,
Don't worry about this warning -- I suspect it is just due to your input read set having a number of reads that do not have an associated barcode. For reference, I saw this line in a recent run of ARKS:
WARNING:: Your chromium read file has 27759471 read pairs that have barcodes not in the barcode multiplicity file.Cumulative memory usage: 1452348
And there are exactly that number of read pairs that do not have associated barcodes
[lcoombe@hpce705 Tigmint-ARKS]$ gunzip -c chromium.fq.gz |grep "HISEQ" |grep -v "BX:Z:" |wc -l
55518942
[lcoombe@hpce705 Tigmint-ARKS]$ echo $(( 55518942/2 ))
27759471
I do agree that the warning itself is a little bit cryptic and we could be more clear about if the barcode is not in the provided multiplicity file or whether the read pair just doesn't have a barcode at all.
And yes, it is also expected that a good number of reads will be marked as not having a 'good contig'. This can be due to a number of reasons, including both reads in a read pair not mapping to the same contig, or the jaccard index of a read pair not being above the threshold for any contig.
As for your parameters, they look fine to me except you could also try a slightly higher k
-- I haven't run ARKS with a k-mer size of less than 40. I do find that is a good parameter to do a sweep on -- I find a different optimal k depending on the input assembly.
Hope that helps!
Lauren
Hi Lauren,
many thanks, it was really helpful. I'm now testing with other ranges of k
to see if there are improvements.
I'll close this issue now as my questions have been addressed.