trichelab/biscuiteer

Issue merging many large bsseq objects with biscuiteer::unionize()

Opened this issue · 5 comments

Working on my first analysis w/ BISCUIT/biscuiteer, but I’ve encountered some issues handling the data. I have 20 gzip/tabix’d VCFs (15-20Gb each) with accompanying bed.gz files. Biscuiteer seems to be working just fine with small/toy datasets. However, I’ve been having issues merging all these samples into a single bsseq object. I think part of the issue is simply due to the large sample number and the amount of data for each sample. I have attempted to solve this issue with two approaches that have failed thus far:

  1. biscuiteer::readBiscuit() for each sample individually and then use biscuiteer::unionize() to get a single object.
  2. Merge vcf.gz and bed.gz files on the command line and then import together using biscuiteer::readBiscuit()

Do you have any advice for a better/ideal approach in this situation?

thanks in advance!

Tim,

I suppose this isn't so much of an issue as a question, hence, my lack of sessionInfo() and error message. I think the package is working as intended. I was just hoping to understand best-practice when it comes to improving performance/speed.

I'll move forward with your suggestion of jointly calling variants with BISCUIT into a single VCF. Feel free to close this issue unless you'd like further info from my experience.

thanks much!
Dean

just to clarify, i don't have any errors. I'm not used to handling objects of this magnitude in R, so i was just looking for direction regarding an optimal approach :)