biod/sambamba

sambamba markdup too many open files issue

Closed this issue · 7 comments

sambamba-markdup: Cannot open or create file '/foo/bar/sambamba-pid13785-hdac/sorted.86.bam' : Too many open files

Any ideas how to overcome this issue?

sambamba --help
sambamba v0.5.5

Usage: sambamba [command] [args...]

    Available commands: 'view', 'index', 'merge', 'sort',
                        'flagstat', 'slice', 'markdup', 'depth', 'mpileup'
    To get help on a particular command, just call it without args.

Leave bug reports and feature requests at
https://github.com/lomereiter/sambamba/issues

try --overflow-list-size 600000

Agreed, enlarging the --overflow-list-size also worked for me. Alternatively you can also look into your system's "open files limits", for example in /etc/security/limits.conf

A simple idea would be to merge intermediate files once there are too many of them. That's on my to-do list.

still got too many open files error after using "--overflow-list-size 600000" in a large batch sample running.

can the number in "600000" be increased. what is the upper threshold? or is there any limits related to this count? e.g. memory size

@wuyilei: as it is stated in the linked nextflow issue, setting ulimit to unlimited should have solved the problem, so I consider this strange as well.

There is no upper threshold, and another suggestion is to increase --hash-table-size as well. This naturally increases memory consumption linearly. (And if you run into issues with memory, please check out and test if markdup-extsort branch works for you.)

Isn't popFrontOverflowList leaking files when a paired read is found in the overflow list? In that case closeTmpWriter is not called. I'm guessing the std.stdio.File for the index will be automatically cleaned up, but I don't think the BamWriter will be? Does GC get that?

Potentially also readsFromTempFiles - not clear to me what can be counted on from the runtime.

Inactive