mozack/abra2

Increase MAX_SAMPLES to 12 or 16

ckandoth opened this issue · 2 comments

Hello, thanks for the amazing work with Abra! We have 16 whole-exome BAMs from the same patient, which would fail through Abra 2.16 with a SIGSEGV (0xb) [libc.so.6] error detailed in the log below. Our fix was to increase MAX_SAMPLES to 16 in this line.

hs_err_pid54727.log.txt

Would you consider increasing this constant for your next release? What are the downsides? Thanks!

Hi,
There is no direct problem with increasing this and I will do so. However, I should mention that we have not done thorough testing with anything beyond trios to date.

Also, the local assembly graph is not allowed to exceed a configurable maximum number of nodes for any given region. If this number is exceeded, the local assembly is aborted and processing continues without assembled contigs for that region. If the input data are noisy and there are many singleton kmers arising as a result of base errors across multiple samples, you may run into this limit more frequently than with a smaller number of samples.

To see if this limit is reached frequently, run with log level set to TRACE or DEBUG and grep for the following message:
Graph too complex for region:

To increase the max number of vertices per assembly graph, use param:
--maxn
which defaults to 9000.

Additionally, contigs are weighted based upon read support and in cases where a maximum number of contigs is exceeded, those with less read support are not considered. These limits may be more likely to be reached as the number of samples processed is increased. This can apply to assembled contigs (128), contigs arising from high quality soft clipping (16) and contigs arising from indels observed in the initial mappings (8). Not all of these limits are parameterized at the moment.

Will plan on setting MAX_SAMPLES to 16 in the next release which should happen no later than next week.

Please let me know if you run into additional problems. We'd be interested in hearing feedback as to whether or not the software is behaving reasonably with a larger number of samples.

Thanks very much, Lisle. This is very helpful.