natsuhiko/rasqual

Estimated computational time is too long ...

lw157 opened this issue · 4 comments

lw157 commented

Hi Natsuhiko,
I got a warning "Estimated computational time is too long for STPG1 (Sample Size=48, NofrSNPs=2795, NoffSNPs=13)...aborted" . I looked all open/closed issues, there are no solutions yet. Is this due to too many tested SNP, or only a problem for me?

My full command is
tabix my_as_fin.vcf.gz chr1:24183489-25243424 | rasqual -y Y_expmat.bin -k K_sizefactor.bin -n 48 -j 8 -f STPG1 -l 2796 -m 61 -s 24683489,24683490,24683495,24683495,24683527,24687341,24687341,24695194,24695533,24696164,24696164,24700192,24700192,24705763,24706143,24706143,24710357,24710392,24710392,24717693,24717748,24718051,24718051,24718052,24718414,24718414,24727569,24727809,24727809,24727809,24727809,24727809,24737193,24737193,24738131,24740164,24740164,24740164,24741401,24741401,24742493,24742967 -e 24685109,24685109,24685109,24685109,24685109,24687531,24687531,24696329,24695897,24696329,24696329,24700300,24700300,24706313,24706313,24706313,24710493,24710493,24710493,24718169,24718169,24718169,24718169,24718169,24718561,24718557,24727946,24727949,24727946,24727946,24727946,24727821,24737307,24737269,24738517,24743424,24740230,24740215,24741588,24741587,24742643,24743085 -x x_covar.bin --n-threads 5

Thanks a lot.

All best,
Liuyang

Hi Liuyang,

It looks the feature definition (-s and -e) is strange. For example 24683495 appears twice, the first region (24683489-24685109) is overlapping with the second (24683490-24685109), etc...You have to revise this information first. The features have to be mutually exclusive.

In addition, the computational time issue just warns the cis regulatory region you specified (chr1:24183489-25243424) is too large. You can ignore it by using '--force' option.

Best regards,
Natsuhiko

lw157 commented

Hi Natsuhiko,
That is a good catch. You have sharper eyes than me. Those must be the nested exons from different transcripts. I used the following code to prepare the gene meta.

library(GenomicFeatures)
library(EnsDb.Hsapiens.v75) ##  hg19

edb = EnsDb.Hsapiens.v75
##TxByGns <- transcriptsBy(edb, by = "gene")
ExByGns<- exonsBy(edb, by = "gene")

GeneByExons = ExByGns %>%
  tbl_df() %>%
  #dplyr::select(seqnames, group_name,strand, start, end) %>%
  dplyr::filter(seqnames %in% c(1:22, "X","Y")) %>%
  dplyr::group_by(group_name) %>%
  dplyr::arrange(start) %>%
  dplyr::mutate(exon_start = paste0(start, collapse=","), exon_end = paste0(end, collapse=",")) %>%
  dplyr::select(-start, -end, -width,-exon_id,-group) %>%
  unique()

Do you have any suggestion that any tools can avoid such situation? Thanks a lot.

Best regards,
Liuyang

Hi Liuyang,

It looks the feature definition (-s and -e) is strange. For example 24683495 appears twice, the first region (24683489-24685109) is overlapping with the second (24683490-24685109), etc...You have to revise this information first. The features have to be mutually exclusive.

In addition, the computational time issue just warns the cis regulatory region you specified (chr1:24183489-25243424) is too large. You can ignore it by using '--force' option.

Best regards,
Natsuhiko

I usually use bedtools/merge to integrate coding regions.

Best regards,
Natsuhiko

lw157 commented

I usually use bedtools/merge to integrate coding regions.

Best regards,
Natsuhiko

Thanks a lot, Natsuhiko. I will have a look at Bedtools.

Best regards,
Liuyang