dentearl/mafTools

mafSorter Seg fault for large files

Closed this issue · 9 comments

Hi,
I used mafSorter on 120 maf files (outputs of mafStrander), and it is failing for the 29 largest ones with this error:
Segmentation fault (core dumped)

Which makes me think that there is a memory allocation issue? I watched one of the job, and I don't see the memory usage go up much though (and I am on a 2 TB memory machine).
Do you have any suggestion to solve this?
Thank you!

The smallest file that fails is 2423694954 large
And the largest file that does not fail is 2388039004 large

So 2.42G vs 2.39G, hmm. Are the files so large because they have lots of blocks or because there are a few blocks that are enormous?

One way to really get at this would be to compile with debugging on and run it through gdb. That would give us the location of the failure at least.

Thank you for the quick reply! Based on your comment of the block numbers I counted them in all files, and largest files are in fact largest because they have more blocks (size and block counts correlate, R2=0.995). The counts range from 1081237 to 3982663 for the files that failed, and up to 1038319 blocks for the files that did not fail.
This does not exclude that there are very large block in the files that failed though - but I do not know how to compile with debugging on? If you send me a command line I will check.
I could also filter out small blocks and see what happens? Is this an option in one of the mafTools?

@4ureliek I don't think there's a specific mafTool do filter blocks by size. If there were, it'd live in mafFilter. One thing, does the maf validate? When you run it through mafValidator, does it come out clean? I bet it does but it'd be an easier fix for me if it doesn't validate ;)
Compiling with debugging is as easy as passing -g -O0 to gcc, iirc.

@diekhans yeah, I guess we could leave the -g and -O0 flags on all the time but eh. :)

I ran this:
python ~/mafTools/bin/mafValidator.py --maf=superscaffold8.strand.maf

And it said 'done' but I did not see anything in stderr or stdout? Does that mean it passed, or I am missing something?

Thanks!

@4ureliek Yeah, iirc 'done' is the best you can hope for. Hm. Can you put the failing maf somewhere publicly available where I can download it and experiment? I can't promise much because of competing priorities but I'm curious enough to spend some more time on this.

@diekhans huh, #til.

Hi, I completely forgot to share with you one of the files... I need an email to share the folder where it's at, could you email me at 4urelie.k (at) gmail ? Thanks!

I ended up using the code from here: https://raw.githubusercontent.com/UCSantaCruzComputationalGenomicsLab/last/master/scripts/maf-sort.sh
and it seems to be doing the trick on these big files.