gymrek-lab/TRTools

mergeSTR handling large catalogs

Closed this issue · 0 comments

  • I'm using mergeSTR to merge ExpansionHunter VCFs using large variant catalogs (>2 million loci) on Hail Batch/GCP and it seems like the job stalls for these large catalogs. mergeSTR works fine with smaller catalogs (~200k loci). Is this expected?

  • Does mergeSTR have multi-threading support?

  • Can I use the output of mergeSTR as input into another mergeSTR job? For example, I merge 50 VCFs using mergeSTR into output1.vcf, and then as a second job, I merge output1.vcf with another 30 VCFs using mergeSTR again.