Validating Existing gRNA Libraries - Error in Off-Target Characterization (addSpacerAlignment)
stefanusbernard opened this issue · 7 comments
Hi, really appreciate for the tools provided by crisprVerse team. I tried to score different sgRNA libraries using Validating Existing gRNA Libraries tutorial. First, I used Avana library (70018 rows) and successfully generate the on and off target scoring. However, when I use Cellecta library (150076 rows), an error occurred in addSpacerAlignment function (Off-target characterization).
[runCrisprBowtie] Using BSgenome.Hsapiens.UCSC.hg38
[runCrisprBowtie] Searching for SpCas9 protospacers
reads processed: 149545
reads with at least one alignment: 149545 (100.00%)
reads that failed to align: 0 (0.00%)
Reported 6177820 alignments
Error in METHOD(x, i) :
Subsetting operation on CompressedGRangesList object 'x'
produces a result that is too big to be represented as a
CompressedList object. Please try to coerce 'x' to a SimpleList
object first (with 'as(x, "SimpleList")').
The ensuing alignment generate large data (614520 rows), after subsequent data filtering and construction of guideset as mentioned in the tutorial, the resulting guideset consists of (231660 rows). Furthermore, I noticed this error similar to other package in #312 and #328. Kindly assists in this issue, any suggestion and advice would be appreciated.
This is my session info
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_IE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_IE.UTF-8 LC_COLLATE=en_IE.UTF-8
[5] LC_MONETARY=en_IE.UTF-8 LC_MESSAGES=en_IE.UTF-8
[7] LC_PAPER=en_IE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape_0.8.9 ggfortify_0.4.16
[3] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.66.3
[5] Biostrings_2.66.0 XVector_0.38.0
[7] crisprDesignData_0.99.28 crisprViz_1.0.0
[9] crisprDesign_1.0.0 crisprScore_1.2.0
[11] crisprScoreData_1.2.0 ExperimentHub_2.6.0
[13] AnnotationHub_3.6.0 BiocFileCache_2.6.1
[15] dbplyr_2.3.2 crisprBowtie_1.2.0
[17] crisprBase_1.2.0 crisprVerse_1.0.0
[19] splitstackshape_1.4.8 rtracklayer_1.58.0
[21] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[23] IRanges_2.32.0 S4Vectors_0.36.2
[25] BiocGenerics_0.44.0 geno2proteo_0.0.6
[27] patchwork_1.1.2 hgnc_0.1.2
[29] data.table_1.14.8 lubridate_1.9.2
[31] forcats_1.0.0 stringr_1.5.0
[33] dplyr_1.1.1 purrr_1.0.1
[35] readr_2.1.4 tidyr_1.3.0
[37] tibble_3.2.1 ggplot2_3.4.2
[39] tidyverse_2.0.0 UniprotR_2.2.2
loaded via a namespace (and not attached):
[1] utf8_1.2.3 reticulate_1.28
[3] R.utils_2.12.2 RUnit_0.4.32
[5] tidyselect_1.2.0 RSQLite_2.3.1
[7] AnnotationDbi_1.60.2 htmlwidgets_1.6.2
[9] grid_4.2.3 BiocParallel_1.32.6
[11] airr_1.4.1 munsell_0.5.0
[13] codetools_0.2-19 interp_1.1-4
[15] withr_2.5.0 colorspace_2.1-0
[17] Biobase_2.58.0 filelock_1.0.2
[19] knitr_1.42 rstudioapi_0.14
[21] ggsignif_0.6.4 MatrixGenerics_1.10.0
[23] GenomeInfoDbData_1.2.9 bit64_4.0.5
[25] basilisk_1.10.2 vctrs_0.6.1
[27] generics_0.1.3 xfun_0.38
[29] biovizBase_1.46.0 timechange_0.2.0
[31] randomForest_4.7-1.1 R6_2.5.1
[33] AnnotationFilter_1.22.0 bitops_1.0-7
[35] cachem_1.0.7 DelayedArray_0.24.0
[37] vroom_1.6.1 promises_1.2.0.1
[39] BiocIO_1.8.0 networkD3_0.4
[41] scales_1.2.1 nnet_7.3-18
[43] gtable_0.3.3 ensembldb_2.22.0
[45] rlang_1.1.0 rstatix_0.7.2
[47] lazyeval_0.2.2 dichromat_2.0-0.1
[49] checkmate_2.1.0 broom_1.0.4
[51] BiocManager_1.30.20 yaml_2.3.7
[53] abind_1.4-5 GenomicFeatures_1.50.4
[55] backports_1.4.1 httpuv_1.6.9
[57] Hmisc_5.0-1 tools_4.2.3
[59] ellipsis_0.3.2 RColorBrewer_1.1-3
[61] Rcpp_1.0.10 plyr_1.8.8
[63] base64enc_0.1-3 progress_1.2.2
[65] zlibbioc_1.44.0 RCurl_1.98-1.12
[67] basilisk.utils_1.10.0 prettyunits_1.1.1
[69] deldir_1.0-6 rpart_4.1.19
[71] ggpubr_0.6.0 cluster_2.1.4
[73] SummarizedExperiment_1.28.0 magrittr_2.0.3
[75] magick_2.7.4 alakazam_1.2.1
[77] ProtGenerics_1.30.0 matrixStats_0.63.0
[79] evaluate_0.20 hms_1.1.3
[81] mime_0.12 xtable_1.8-4
[83] XML_3.99-0.14 jpeg_0.1-10
[85] gridExtra_2.3 compiler_4.2.3
[87] biomaRt_2.54.1 crayon_1.5.2
[89] R.oo_1.25.0 htmltools_0.5.5
[91] later_1.3.0 tzdb_0.3.0
[93] Formula_1.2-5 qdapRegex_0.7.5
[95] Rbowtie_1.38.0 DBI_1.1.3
[97] gprofiler2_0.2.1 MASS_7.3-58.2
[99] rappdirs_0.3.3 data.tree_1.0.0
[101] Matrix_1.5-3 ade4_1.7-22
[103] car_3.1-2 cli_3.6.1
[105] R.methodsS3_1.8.2 parallel_4.2.3
[107] Gviz_1.42.1 igraph_1.4.2
[109] pkgconfig_2.0.3 GenomicAlignments_1.34.1
[111] dir.expiry_1.6.0 foreign_0.8-84
[113] plotly_4.10.1 xml2_1.3.3
[115] VariantAnnotation_1.44.1 digest_0.6.31
[117] rmarkdown_2.21 htmlTable_2.4.1
[119] restfulr_0.0.15 curl_5.0.0
[121] shiny_1.7.4 Rsamtools_2.14.0
[123] rjson_0.2.21 lifecycle_1.0.3
[125] nlme_3.1-162 jsonlite_1.8.4
[127] carData_3.0-5 seqinr_4.2-30
[129] viridisLite_0.4.1 fansi_1.0.4
[131] pillar_1.9.0 ggsci_3.0.0
[133] lattice_0.20-45 KEGGREST_1.38.0
[135] fastmap_1.1.1 httr_1.4.5
[137] interactiveDisplayBase_1.36.0 glue_1.6.2
[139] png_0.1-8 BiocVersion_3.16.0
[141] bit_4.0.5 stringi_1.7.12
[143] blob_1.2.4 latticeExtra_0.6-30
[145] memoise_2.0.1 ape_5.7-1
Thanks @stefanusbernard for reporting this! Would you be able to share your GuideSet
object for the Cellecta library to give us a jump start? @ltHobbes Would you be able to help on this?
Hi is there any update about this issue? kindly let me know if there is an update.
@stefanusbernard We are working on it
@stefanusbernard The problem comes from the fact that many of the spacer sequences are repeated in the GuideSet(e.g. CACCTGTAATCCCAGCTACT
), and those sequences have thousand of alignments. This results in a final alignment table that has more than 3 billion rows, which causes the error. I suggest to use addSpacerAlignmentsIterative
(this worked for me) as it uses an early stop when a given gRNA has hundreds of off-targets.
Hi @Jfortin1 thanks for your help it works well for the addSpacerAlignmentsIterative
. However, when I continue to add the on (addOnTargetScores
) and off target scoring (addOffTargetScores
), it results in the same error as the previous one. I understand about the repeated spacer sequences in the GuideSet as you mentioned before and I'd like to hear any suggestion from you as I am trying to score the whole library. Really appreciate and thanks again for the assistance from the CRISPRVerse team.
Hi @stefanusbernard, a simple solution here is to remove those promiscuous sgRNAs from the GuideSet upfront; there is a little value in further annotating those sgRNAs knowing that they map to thousands of loci.
Hi @Jfortin1 thanks for your assistance I managed to solve this issue. I will close this thread