YuLab-SMU/seqcombo

sequences are mandatory to be capital

Closed this issue · 8 comments

Hi, @GuangchuangYu

I found the package demands the input sequence must be capital. Otherwise, something gets wrong.

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    
system code page: 936

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] igraph_1.2.6    forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7    
 [5] purrr_0.3.4     readr_1.4.0     tidyr_1.1.3     tidyverse_1.3.1
 [9] ggplot2_3.3.5   tibble_3.1.2    seqcombo_1.14.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7             lubridate_1.7.10       Biostrings_2.60.1     
 [4] assertthat_0.2.1       digest_0.6.27          utf8_1.2.1            
 [7] IRdisplay_1.0          cellranger_1.1.0       R6_2.5.0              
[10] GenomeInfoDb_1.28.1    repr_1.1.3             backports_1.2.1       
[13] reprex_2.0.0           stats4_4.1.0           evaluate_0.14         
[16] httr_1.4.2             pillar_1.6.1           zlibbioc_1.38.0       
[19] rlang_0.4.11           readxl_1.3.1           uuid_0.1-4            
[22] rstudioapi_0.13        S4Vectors_0.30.0       labeling_0.4.2        
[25] RCurl_1.98-1.3         munsell_0.5.0          broom_0.7.8           
[28] modelr_0.1.8           compiler_4.1.0         pkgconfig_2.0.3       
[31] BiocGenerics_0.38.0    base64enc_0.1-3        htmltools_0.5.1.1     
[34] tidyselect_1.1.1       GenomeInfoDbData_1.2.6 IRanges_2.26.0        
[37] fansi_0.5.0            crayon_1.4.1           dbplyr_2.1.1          
[40] withr_2.4.2            bitops_1.0-7           grid_4.1.0            
[43] jsonlite_1.7.2         gtable_0.3.0           lifecycle_1.0.0       
[46] DBI_1.1.1              magrittr_2.0.1         scales_1.1.1          
[49] cli_3.0.0              stringi_1.6.2          farver_2.1.0          
[52] XVector_0.32.0         fs_1.5.0               xml2_1.3.2            
[55] ellipsis_0.3.2         rvcheck_0.1.8          generics_0.1.0        
[58] vctrs_0.3.8            cowplot_1.1.1          IRkernel_1.2          
[61] tools_4.1.0            Cairo_1.5-12.2         glue_1.4.2            
[64] hms_1.1.0              parallel_4.1.0         colorspace_2.0-2      
[67] BiocManager_1.30.16    rvest_1.0.0            pbdZMQ_0.3-5          
[70] haven_2.4.1           

Yang

any reproducible example.

Thanks for your reply, Dr @GuangchuangYu

For upper

>KT162029
ATGAGTGATGGAGCAGTTCACCCAAACGGGGGTCACCCTGCTGTCAAAAATGAAAAAGCTACAGGATCTGGGAACGGGTCTGGAGGCGGGGGGGGGGGGGGTTCGGGGGGGGGGGGGATTTCTACGGGTACTTTCAATAATCAAACGGAATTTAAATTTTTGGAAAACGGATGGGTGGAAATCACAGCAAACTCAAGCAAACTTGTACATTTAAATATGCCAAAAAGTGAAAATTATAAAAAAGGGGTTGTAAATAATTTGGATAAAACTGCATTTAACGGAAACATGGCTTTAAATGATACCCATGCACAAATTGTAACACCTGGGTCATTGGTTGATGCAAATGCTTGGGGAGTTTGGTTTAATCCAGGAGATTGGCAACTAATTGTTAATACTATGAGTGAGTTGCATTTAGTTAGTTTTGAACAAGAAATTTTTAATGTTGTTTTAAAGACTGTTTCAGAATCTGCTACTCAGCCACCAACTAAAGTTTATAATAATGATTTAACTGCATCATTGATGGTTGCATTAGATAGTAATAATACTATGCCATTTACTCCAGCAGCTATGAGATCTGAGACATTGGGTTTTTATCCATGGAAACCAACCATACCAACTCCATGGAGATATTATTTTCAATGGGATAGAACATTAATACCATCTCATACTGGAACTAGTGGCACACCAACAAATATATACCATGGTACAGATCCAGATGATGTTCAATTTTATACTATTGAAAATTCTGTGCCAGTACACTTACTAAGAACAGGTGATGAATTTGCTACAGGAACATTTTTTTTTGATTGTAAACCATGCAGACTAACACATACATGGCAAACAAATAGAGCATTGGGCTTACCACCATTTCTAAATTCTTTGCCTCAAGCTGAAGGAGGTACTAACTTTGGTTATATAGGAGTTCAACAAGATAAAAGACGTGGTGTAACTCAAATGGGAAATACAAACATTATTACTGAAGCTACTATTATGAGACCAGCTGAGGTTGGTTATAGTGCACCATATTATTCTTTTGAGGCGTCTACACAAGGGCCATTTAAAACACCTATTGCAGCAGGACGGGGGGGAGCGCAAACAGATGAAAATCAAGCAGCAGATGGTGATCCAAGATATGCATTTGGTAGACAACATGGTCAGAAAACTACCACAACAGGAGAAACACCTGAGAGATTTACATATATAGCACATCAAGATACAGGAAGATATCCAGAAGGAGATTGGATTCAAAATATTAACTTTAACCTTCCTGTAACAAATGATAATGTATTGCTACCAACAGATCCAATTGGAGGTAAAACAGGAATTAACTATACTAATATATTTAATACTTATGGTCCTTTAACTGCATTAAATAATGTACCACCAGTTTATCCAAATGGTCAAATTTGGGATAAAGAATTTGATACTGACTTAAAACCAAGACTTCATGTAAATGCACCATTTGTTTGTCAAAATAATTGTCCCGGTCAATTATTTGTAAAAGTTGCGCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGATTTTTGGTGGAAAGGTAAATTAGTATTTAAAGCTAAACTAAGAGCCTCTCATACTTGGAATCCAATTCAACAAATGAGTATTAATGTAGATAACCAATTTAACTATGTACCAAGTAACATTGGAGGTATGAAAATTGTATATGAGAAATCTCAACTAGCACCTAGAAAATTATAT
>KT162030
ATGAGTGATGGACCATTTCACCCAAACGGGGGTCACCCTGCTGTCAAAAATGAAAAACCTACAGGATCTGGGAACGGGTCTGGAGGCGGGGGGGGGGGGGGTTCGGGGGGTGGGGGGATTTCTACGGGTACTTTCAATAATCAAACGGAATTTAAATTTTTGGAAAACGGATGGGGGGAAATCACAGCAAACTCAACCAAATTTGTACTTTTAAATATGCCAAAACGTGAAAATTATAAAAAAGTGGTTGTAAATAATTTGGATAAAATTGCATTTAACGGAAACATGGCTTTAAATGATCCCCATGCACAAATTGTAACACCTTGGTCATTGGTTGATGCAAATGCTTGGGGAGTTTGGTTTAATCCAGGAGATTGGCAACTAATTGTTAATACTATGAGTGAGTTGCATTTAGTTAGTTTTGAACAAGAAATTTTTAATGTTGTTTTAAAGACTGTTTCAGAATCTGCTACTCAGCCACCAACTAAAGTTTATAATAATGATTTAACTGCATCATTGATGGTTGCATTAGATAGTAATAATACTATGCCATTTACTCCAGCAGCTATGAGATCTGAGACATTGGGTTTTTATCCATGGAAACCAACCATACCAACTCCATGGAGATATTATTTTCAATGGGATAGAACATTAATACCATCTCATACTGGAACTAGTGGCACACCAACAAATATATACCATGGTACAGATCCAGATGATGTTCAATTTTACACTATTGAAAATTCTGTGCCAGTACACTTACTAAGAACAGGTGATGAATTTGCTACAGGAACATTTTATTTTGATTGTAAACCATGTAGACTAACACACACATGGCAAACAAATAGAGCATTGGGCTTACCACCATTTCTAAATTCTTTGCCTCAAGCTGAAGGAGGTACTAACTTTGGTTATATAGGAGTTCAACAAGATAAAAGACGTGGTGTAACTCAAATGGGAAATACAAACATTATTACTGAAGCTACTATTATGAGACCAGCTGAGGTTGGTTATAGTGCACCATATTATTCTTTTGAGGCGTCTACACAAGGGCCATTTAAAACACCTATTGCAGCAGGACGGGGGGGAGCGCAAACAGATGAAAATCAAGCAGCAGATGGTGATCCAAGATATGCATTTGGTAGACAACATGGTCAAAAAACTACCACAACAGGAGAAACACCTGAGAGATTTACATATATAGCACATCAAGATACAGGAAGATATCCAGAAGGAGATTGGATTCAAAATATTAACTTTAACCTTCCTGTAACAAATGATAATGTATTGCTACCAACAGATCCAATTGGAGGTAAAGCAGGAATTAACTATACTAATATATTTAATACTTATGGTCCTTTAACTGCATTAAATAATGTACCACCAGTTTATCCAAATGGTCAAATTTGGGATAAAGAATTTGATACTGACTTAAAACCAAGACTTCATGTAAATGCACCATTTGTTTGTCAAAATAATTGTCCTGGTCAATTATTTGTAAAAGTTGCGCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGATTTTTGGTGGAAAGGTAAATTAGTATTTAAAGCTAAACTAAGAGCCTCTCATACTTGGAATCCAATTCAACAAATGAGTATTAATGTAGATAACCAATTTAACTATGTACCAAGTAATATTGGAGGTATGAAAATTGTATATGAAAAATCTCAACTAGCACCTAGAAAATTATAC
rm(list=ls())
library(seqcombo)

y <- seqdiff("KT162030.fas", reference=1)
py <- plot(y)
py

What I have
image

For lower

>kt162029
atgagtgatggagcagttcacccaaacgggggtcaccctgctgtcaaaaatgaaaaagctacaggatctgggaacgggtctggaggcggggggggggggggttcgggggggggggggatttctacgggtactttcaataatcaaacggaatttaaatttttggaaaacggatgggtggaaatcacagcaaactcaagcaaacttgtacatttaaatatgccaaaaagtgaaaattataaaaaaggggttgtaaataatttggataaaactgcatttaacggaaacatggctttaaatgatacccatgcacaaattgtaacacctgggtcattggttgatgcaaatgcttggggagtttggtttaatccaggagattggcaactaattgttaatactatgagtgagttgcatttagttagttttgaacaagaaatttttaatgttgttttaaagactgtttcagaatctgctactcagccaccaactaaagtttataataatgatttaactgcatcattgatggttgcattagatagtaataatactatgccatttactccagcagctatgagatctgagacattgggtttttatccatggaaaccaaccataccaactccatggagatattattttcaatgggatagaacattaataccatctcatactggaactagtggcacaccaacaaatatataccatggtacagatccagatgatgttcaattttatactattgaaaattctgtgccagtacacttactaagaacaggtgatgaatttgctacaggaacatttttttttgattgtaaaccatgcagactaacacatacatggcaaacaaatagagcattgggcttaccaccatttctaaattctttgcctcaagctgaaggaggtactaactttggttatataggagttcaacaagataaaagacgtggtgtaactcaaatgggaaatacaaacattattactgaagctactattatgagaccagctgaggttggttatagtgcaccatattattcttttgaggcgtctacacaagggccatttaaaacacctattgcagcaggacgggggggagcgcaaacagatgaaaatcaagcagcagatggtgatccaagatatgcatttggtagacaacatggtcagaaaactaccacaacaggagaaacacctgagagatttacatatatagcacatcaagatacaggaagatatccagaaggagattggattcaaaatattaactttaaccttcctgtaacaaatgataatgtattgctaccaacagatccaattggaggtaaaacaggaattaactatactaatatatttaatacttatggtcctttaactgcattaaataatgtaccaccagtttatccaaatggtcaaatttgggataaagaatttgatactgacttaaaaccaagacttcatgtaaatgcaccatttgtttgtcaaaataattgtcccggtcaattatttgtaaaagttgcgcctaatttaacaaatgaatatgatcctgatgcatctgctaatatgtcaagaattgtaacttactcagatttttggtggaaaggtaaattagtatttaaagctaaactaagagcctctcatacttggaatccaattcaacaaatgagtattaatgtagataaccaatttaactatgtaccaagtaacattggaggtatgaaaattgtatatgagaaatctcaactagcacctagaaaattatat
>kt162030
atgagtgatggaccatttcacccaaacgggggtcaccctgctgtcaaaaatgaaaaacctacaggatctgggaacgggtctggaggcggggggggggggggttcggggggtggggggatttctacgggtactttcaataatcaaacggaatttaaatttttggaaaacggatggggggaaatcacagcaaactcaaccaaatttgtacttttaaatatgccaaaacgtgaaaattataaaaaagtggttgtaaataatttggataaaattgcatttaacggaaacatggctttaaatgatccccatgcacaaattgtaacaccttggtcattggttgatgcaaatgcttggggagtttggtttaatccaggagattggcaactaattgttaatactatgagtgagttgcatttagttagttttgaacaagaaatttttaatgttgttttaaagactgtttcagaatctgctactcagccaccaactaaagtttataataatgatttaactgcatcattgatggttgcattagatagtaataatactatgccatttactccagcagctatgagatctgagacattgggtttttatccatggaaaccaaccataccaactccatggagatattattttcaatgggatagaacattaataccatctcatactggaactagtggcacaccaacaaatatataccatggtacagatccagatgatgttcaattttacactattgaaaattctgtgccagtacacttactaagaacaggtgatgaatttgctacaggaacattttattttgattgtaaaccatgtagactaacacacacatggcaaacaaatagagcattgggcttaccaccatttctaaattctttgcctcaagctgaaggaggtactaactttggttatataggagttcaacaagataaaagacgtggtgtaactcaaatgggaaatacaaacattattactgaagctactattatgagaccagctgaggttggttatagtgcaccatattattcttttgaggcgtctacacaagggccatttaaaacacctattgcagcaggacgggggggagcgcaaacagatgaaaatcaagcagcagatggtgatccaagatatgcatttggtagacaacatggtcaaaaaactaccacaacaggagaaacacctgagagatttacatatatagcacatcaagatacaggaagatatccagaaggagattggattcaaaatattaactttaaccttcctgtaacaaatgataatgtattgctaccaacagatccaattggaggtaaagcaggaattaactatactaatatatttaatacttatggtcctttaactgcattaaataatgtaccaccagtttatccaaatggtcaaatttgggataaagaatttgatactgacttaaaaccaagacttcatgtaaatgcaccatttgtttgtcaaaataattgtcctggtcaattatttgtaaaagttgcgcctaatttaacaaatgaatatgatcctgatgcatctgctaatatgtcaagaattgtaacttactcagatttttggtggaaaggtaaattagtatttaaagctaaactaagagcctctcatacttggaatccaattcaacaaatgagtattaatgtagataaccaatttaactatgtaccaagtaatattggaggtatgaaaattgtatatgaaaaatctcaactagcacctagaaaattatac
z <- seqdiff("KT162030_lower.fas", reference=1)
pz <- plot(z)
pz

What I got
image

Something is missing in the nucleotide position part.

these functions will ultimately go to the ggmsa package and @nyzhoulang can help to solve this issue.

Thanks. Great to know this.

The plot_difference() function in method-plot.R can not identify lowercases.
image

I will fix it after these functions are migrated to ggmsa package.

Hi, @nyzhoulang

Thanks for your reply. Good to know that.
Is that possible to compatible the ambiguous bases?

Yang

Hi, Yang

We fixed the bug and migrated these functions to ggmsa package.
Now, both capital and lower cases can be compatible.

And you need to install the dev ggmsa by the following block:

if (!requireNamespace("devtools", quietly=TRUE))
    install.packages("devtools")
devtools::install_github("YuLab-SMU/ggmsa")

The features and function names in ggmsa are as same as seqcombo package:

library(ggmsa)
y <- seqdiff("KT162030.fas", reference=1)
plot(y)

Thanks,
Lang

@nyzhoulang Excellent! Thanks a lot.