ucscGenomeBrowser/kent

bigWigSummary produces needLargeMem errors on many bigwigs

nvictus opened this issue · 6 comments

On certain files, it happens specifically when querying chr8 with end close to the chromosome size and a large number of bins (> 100,000).

$ bigWigSummary ENCFF856LYZ.bigWig chr8 0 145138636 120000 > /dev/null
needLargeMem: trying to allocate 18446744069429595380 bytes (limit: 17179869184)

The attempted allocation is clearly not reasonable.

I'm running into this same issue with the following files from encode:

https://www.encodeproject.org/files/ENCFF856LYZ/@@download/ENCFF856LYZ.bigWig
https://www.encodeproject.org/files/ENCFF992JJW/@@download/ENCFF992JJW.bigWig
https://www.encodeproject.org/files/ENCFF928WEU/@@download/ENCFF928WEU.bigWig
https://www.encodeproject.org/files/ENCFF723XZS/@@download/ENCFF723XZS.bigWig
https://www.encodeproject.org/files/ENCFF828IPR/@@download/ENCFF828IPR.bigWig
https://www.encodeproject.org/files/ENCFF613CYH/@@download/ENCFF613CYH.bigWig
https://www.encodeproject.org/files/ENCFF917YSR/@@download/ENCFF917YSR.bigWig
https://www.encodeproject.org/files/ENCFF676GTP/@@download/ENCFF676GTP.bigWig
https://www.encodeproject.org/files/ENCFF353YGE/@@download/ENCFF353YGE.bigWig
https://www.encodeproject.org/files/ENCFF724KWV/@@download/ENCFF724KWV.bigWig
https://www.encodeproject.org/files/ENCFF153FDP/@@download/ENCFF153FDP.bigWig
https://www.encodeproject.org/files/ENCFF359QVU/@@download/ENCFF359QVU.bigWig
https://www.encodeproject.org/files/ENCFF367WTF/@@download/ENCFF367WTF.bigWig
https://www.encodeproject.org/files/ENCFF562RHH/@@download/ENCFF562RHH.bigWig
https://www.encodeproject.org/files/ENCFF569CSW/@@download/ENCFF569CSW.bigWig
https://www.encodeproject.org/files/ENCFF876DXW/@@download/ENCFF876DXW.bigWig
https://www.encodeproject.org/files/ENCFF434YEG/@@download/ENCFF434YEG.bigWig
https://www.encodeproject.org/files/ENCFF232VFZ/@@download/ENCFF232VFZ.bigWig
https://www.encodeproject.org/files/ENCFF676UXN/@@download/ENCFF676UXN.bigWig
https://www.encodeproject.org/files/ENCFF700YOH/@@download/ENCFF700YOH.bigWig
https://www.encodeproject.org/files/ENCFF629RRF/@@download/ENCFF629RRF.bigWig
https://www.encodeproject.org/files/ENCFF791ZIC/@@download/ENCFF791ZIC.bigWig
https://www.encodeproject.org/files/ENCFF038IYA/@@download/ENCFF038IYA.bigWig

Ooh, interesting bug! We'll look into this further (unless someone already knows the answer and I get to learn something new), but I suspect that the problem may have occurred during the generation of the bigWig files (so something wrong with the bigWig writer, not the reader). If I'm reading the bytes right, the first bigWig file internally claims to have a size of 0xffffffff8855ee2f bytes, or about 1.8e+19. As you suggest, that seems unlikely. Using only the low-order 32 bits of that value gives 2287332911, which seems much more reasonable.

So, I wasn't actually using bigWigSummary. I maintain my own Python bindings to kent lib, and I was trying to read out numpy arrays of 1kb-binned tracks using the summary functionality. I was then able to reproduce the error using bigWigSummary.

I have an easy workaround: don't use the summary functionality and just do the binning and averaging in Python. It's more accurate anyway, since the other way is really interpolating from the nearest zoom level. pyBigWig, which uses its own bigwig lib, also seems to be able to execute the same queries.

So this isn't an impediment for me, but I thought I'd report it.

Thanks for reporting this @nvictus . It does look like this is a problem with how these files were created on the encode portal. I've sent them mail in an attempt to track down how this problem was introduced.

I talked a but to Encode and we've been unable to track down this source of this problem. I've encouraged them to use our most recent code to build new big files since there have been several bug fixes and one may have resolved this problem.

Do let us know if you run into this again.

I've recently run into this exact issue on additional files from Encode beyond those in @nvictus's list.

./bigWigSummary ENCFF089CVK.bigWig chr8 146250000 146300000 100
needLargeMem: trying to allocate 18446744069414602975 bytes (limit: 17179869184)

It can really throw a wrench into a processing pipeline, so if you have any additional updates on a work-around or solution either on your end or on Encode's, I'd appreciate it.