Validating Existing gRNA Libraries - Error in Off-Target Characterization (addSpacerAlignment)

Question

Validating Existing gRNA Libraries - Error in Off-Target Characterization (addSpacerAlignment)

stefanusbernard opened this issue 2 years ago · 7 comments

Hi, really appreciate for the tools provided by crisprVerse team. I tried to score different sgRNA libraries using Validating Existing gRNA Libraries tutorial. First, I used Avana library (70018 rows) and successfully generate the on and off target scoring. However, when I use Cellecta library (150076 rows), an error occurred in addSpacerAlignment function (Off-target characterization).

[runCrisprBowtie] Using BSgenome.Hsapiens.UCSC.hg38 
[runCrisprBowtie] Searching for SpCas9 protospacers 

reads processed: 149545
reads with at least one alignment: 149545 (100.00%)
reads that failed to align: 0 (0.00%)
Reported 6177820 alignments

Error in METHOD(x, i) : 
  Subsetting operation on CompressedGRangesList object 'x'
  produces a result that is too big to be represented as a
  CompressedList object. Please try to coerce 'x' to a SimpleList
  object first (with 'as(x, "SimpleList")').

The ensuing alignment generate large data (614520 rows), after subsequent data filtering and construction of guideset as mentioned in the tutorial, the resulting guideset consists of (231660 rows). Furthermore, I noticed this error similar to other package in #312 and #328. Kindly assists in this issue, any suggestion and advice would be appreciated.

This is my session info

R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_IE.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_IE.UTF-8        LC_COLLATE=en_IE.UTF-8    
 [5] LC_MONETARY=en_IE.UTF-8    LC_MESSAGES=en_IE.UTF-8   
 [7] LC_PAPER=en_IE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape_0.8.9                     ggfortify_0.4.16                 
 [3] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.66.3                  
 [5] Biostrings_2.66.0                 XVector_0.38.0                   
 [7] crisprDesignData_0.99.28          crisprViz_1.0.0                  
 [9] crisprDesign_1.0.0                crisprScore_1.2.0                
[11] crisprScoreData_1.2.0             ExperimentHub_2.6.0              
[13] AnnotationHub_3.6.0               BiocFileCache_2.6.1              
[15] dbplyr_2.3.2                      crisprBowtie_1.2.0               
[17] crisprBase_1.2.0                  crisprVerse_1.0.0                
[19] splitstackshape_1.4.8             rtracklayer_1.58.0               
[21] GenomicRanges_1.50.2              GenomeInfoDb_1.34.9              
[23] IRanges_2.32.0                    S4Vectors_0.36.2                 
[25] BiocGenerics_0.44.0               geno2proteo_0.0.6                
[27] patchwork_1.1.2                   hgnc_0.1.2                       
[29] data.table_1.14.8                 lubridate_1.9.2                  
[31] forcats_1.0.0                     stringr_1.5.0                    
[33] dplyr_1.1.1                       purrr_1.0.1                      
[35] readr_2.1.4                       tidyr_1.3.0                      
[37] tibble_3.2.1                      ggplot2_3.4.2                    
[39] tidyverse_2.0.0                   UniprotR_2.2.2                   

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                    reticulate_1.28              
  [3] R.utils_2.12.2                RUnit_0.4.32                 
  [5] tidyselect_1.2.0              RSQLite_2.3.1                
  [7] AnnotationDbi_1.60.2          htmlwidgets_1.6.2            
  [9] grid_4.2.3                    BiocParallel_1.32.6          
 [11] airr_1.4.1                    munsell_0.5.0                
 [13] codetools_0.2-19              interp_1.1-4                 
 [15] withr_2.5.0                   colorspace_2.1-0             
 [17] Biobase_2.58.0                filelock_1.0.2               
 [19] knitr_1.42                    rstudioapi_0.14              
 [21] ggsignif_0.6.4                MatrixGenerics_1.10.0        
 [23] GenomeInfoDbData_1.2.9        bit64_4.0.5                  
 [25] basilisk_1.10.2               vctrs_0.6.1                  
 [27] generics_0.1.3                xfun_0.38                    
 [29] biovizBase_1.46.0             timechange_0.2.0             
 [31] randomForest_4.7-1.1          R6_2.5.1                     
 [33] AnnotationFilter_1.22.0       bitops_1.0-7                 
 [35] cachem_1.0.7                  DelayedArray_0.24.0          
 [37] vroom_1.6.1                   promises_1.2.0.1             
 [39] BiocIO_1.8.0                  networkD3_0.4                
 [41] scales_1.2.1                  nnet_7.3-18                  
 [43] gtable_0.3.3                  ensembldb_2.22.0             
 [45] rlang_1.1.0                   rstatix_0.7.2                
 [47] lazyeval_0.2.2                dichromat_2.0-0.1            
 [49] checkmate_2.1.0               broom_1.0.4                  
 [51] BiocManager_1.30.20           yaml_2.3.7                   
 [53] abind_1.4-5                   GenomicFeatures_1.50.4       
 [55] backports_1.4.1               httpuv_1.6.9                 
 [57] Hmisc_5.0-1                   tools_4.2.3                  
 [59] ellipsis_0.3.2                RColorBrewer_1.1-3           
 [61] Rcpp_1.0.10                   plyr_1.8.8                   
 [63] base64enc_0.1-3               progress_1.2.2               
 [65] zlibbioc_1.44.0               RCurl_1.98-1.12              
 [67] basilisk.utils_1.10.0         prettyunits_1.1.1            
 [69] deldir_1.0-6                  rpart_4.1.19                 
 [71] ggpubr_0.6.0                  cluster_2.1.4                
 [73] SummarizedExperiment_1.28.0   magrittr_2.0.3               
 [75] magick_2.7.4                  alakazam_1.2.1               
 [77] ProtGenerics_1.30.0           matrixStats_0.63.0           
 [79] evaluate_0.20                 hms_1.1.3                    
 [81] mime_0.12                     xtable_1.8-4                 
 [83] XML_3.99-0.14                 jpeg_0.1-10                  
 [85] gridExtra_2.3                 compiler_4.2.3               
 [87] biomaRt_2.54.1                crayon_1.5.2                 
 [89] R.oo_1.25.0                   htmltools_0.5.5              
 [91] later_1.3.0                   tzdb_0.3.0                   
 [93] Formula_1.2-5                 qdapRegex_0.7.5              
 [95] Rbowtie_1.38.0                DBI_1.1.3                    
 [97] gprofiler2_0.2.1              MASS_7.3-58.2                
 [99] rappdirs_0.3.3                data.tree_1.0.0              
[101] Matrix_1.5-3                  ade4_1.7-22                  
[103] car_3.1-2                     cli_3.6.1                    
[105] R.methodsS3_1.8.2             parallel_4.2.3               
[107] Gviz_1.42.1                   igraph_1.4.2                 
[109] pkgconfig_2.0.3               GenomicAlignments_1.34.1     
[111] dir.expiry_1.6.0              foreign_0.8-84               
[113] plotly_4.10.1                 xml2_1.3.3                   
[115] VariantAnnotation_1.44.1      digest_0.6.31                
[117] rmarkdown_2.21                htmlTable_2.4.1              
[119] restfulr_0.0.15               curl_5.0.0                   
[121] shiny_1.7.4                   Rsamtools_2.14.0             
[123] rjson_0.2.21                  lifecycle_1.0.3              
[125] nlme_3.1-162                  jsonlite_1.8.4               
[127] carData_3.0-5                 seqinr_4.2-30                
[129] viridisLite_0.4.1             fansi_1.0.4                  
[131] pillar_1.9.0                  ggsci_3.0.0                  
[133] lattice_0.20-45               KEGGREST_1.38.0              
[135] fastmap_1.1.1                 httr_1.4.5                   
[137] interactiveDisplayBase_1.36.0 glue_1.6.2                   
[139] png_0.1-8                     BiocVersion_3.16.0           
[141] bit_4.0.5                     stringi_1.7.12               
[143] blob_1.2.4                    latticeExtra_0.6-30          
[145] memoise_2.0.1                 ape_5.7-1

Answer 1 · 2023-04-13T15:04:38.000Z

Thanks @stefanusbernard for reporting this! Would you be able to share your GuideSet object for the Cellecta library to give us a jump start? @ltHobbes Would you be able to help on this?

Answer 2 · 2023-04-24T15:34:57.000Z

Hi is there any update about this issue? kindly let me know if there is an update.

Answer 3 · 2023-04-26T15:27:36.000Z

@stefanusbernard We are working on it

Answer 4 · 2023-04-26T22:57:54.000Z

@stefanusbernard The problem comes from the fact that many of the spacer sequences are repeated in the GuideSet(e.g. CACCTGTAATCCCAGCTACT), and those sequences have thousand of alignments. This results in a final alignment table that has more than 3 billion rows, which causes the error. I suggest to use addSpacerAlignmentsIterative (this worked for me) as it uses an early stop when a given gRNA has hundreds of off-targets.

Answer 5 · 2023-04-28T17:38:50.000Z

Hi @Jfortin1 thanks for your help it works well for the addSpacerAlignmentsIterative. However, when I continue to add the on (addOnTargetScores) and off target scoring (addOffTargetScores), it results in the same error as the previous one. I understand about the repeated spacer sequences in the GuideSet as you mentioned before and I'd like to hear any suggestion from you as I am trying to score the whole library. Really appreciate and thanks again for the assistance from the CRISPRVerse team.

Answer 6 · 2023-05-08T18:20:02.000Z

Hi @stefanusbernard, a simple solution here is to remove those promiscuous sgRNAs from the GuideSet upfront; there is a little value in further annotating those sgRNAs knowing that they map to thousands of loci.

Answer 7 · 2023-05-09T09:07:16.000Z

Hi @Jfortin1 thanks for your assistance I managed to solve this issue. I will close this thread