zhixingfeng/iGDA

Detect minor SNVs not clear enough

gchevignon opened this issue · 2 comments

Hello,

Can you please provide more detail on how to chose a contextmodel ?

Also what type of file should we use as contextmodel ?

  • boosting.conf ?
  • boosting.model ?
  • a folder ?
  • What folder ? :
  • train_A
  • train_C
  • train_G
  • train_T
  • or just ont or pacbio ?

Thanks

Hi gchevignon,
The input of context model is just the whole folder containing "train_A, strain_C, ...". You could see the example in README.md showing what the script looks like. I paste the link here for your convenience. https://www.dropbox.com/home/public/iGDA_examples/pacbio_ecoli/script?preview=igda_detect.sh

The context model for PacBio and ONT are different, they in different folders https://github.com/zhixingfeng/igda_contextmodel.

For PacBio, x in qv_x_NCTC_P6_C4 means the QV threshold that base below x will be masked as N. x=0 means no masking. For ONT, If your data is preprocessed by discarding reads with average QV < x and masking bases with QV < y by "N", use the model named "ont_context_effect_read_qv_x_base_qv_y". "qv_0" means no masking. The "sam_maskqv" command released with iGDA can do the low QV base masking.

Hope this helps.

Best

Zhixing

Yep this help.
Thanks!