ASVCF Memory Usage

Question

ASVCF Memory Usage

SaideepGona opened this issue 4 years ago · 6 comments

Are there any guidelines on the memory consumption of createASVCF.sh? I've run out of memory even when allocating 150GB. I don't really know how to best tell if this is normal or if I'm doing something wrong. There needs to be at least some usage guidelines on this I think.

Answer 1 · 2020-04-21T14:59:11.000Z

In addition, ASEReadCounter doesn't output bam files but rather count tables. Is this format accepted automatically? How do we link these together?

Answer 2 · 2020-04-21T15:08:25.000Z

Hi,

How may samples do you have?

The ASEReadCount output has to be manually combined with the VCF file by yourself.

Best regards,
Natsuhiko

Answer 3 · 2020-04-21T15:26:59.000Z

I have 35 samples in this run.

I see. It's somewhat simpler on my end to be able to just run a single memory heavy job to do the work, but filtering and manual assignment would be the more distributed option.

By the way, I made a fork at: https://github.com/SaideepGona/rasqual, and have been working on a SLURM compatible luigi pipeline to kind of help automate the entire process (currently for RNAseq). As the primary author this might be something you'd find interesting, and I would appreciate your feedback as there are many moving parts

Answer 4 · 2020-04-21T17:24:56.000Z

I found this: https://github.com/walaj/VariantBam

It allows for filtering a bam file based on a VCF to create a smaller bam file which can be used instead. I don't know how much of an improvement it will make in practice, but should help

Answer 5 · 2020-04-30T17:40:05.000Z

So the original issue here I think is solved. I just wanted to follow up and ask about the assay_type parameter. Is it fair to use "atac" mode for other peak-based data? if not, what differences should exist? Thanks!

Answer 6 · 2020-04-30T19:04:36.000Z

Sorry for the late reply. I was going to say you have to split the master VCF into chunks (e.g., 10Mb each) to save the memory usage.

You can use 'atac' option for other peak-based data (such as ChIP-seq, DNase-seq, etc.). The difference between RNA-seq and ATAC-seq is the insert size threshold (RNA-seq paired end reads easily span 10Kb or more if they are spliced.).