OSCA-source/OSCA.advanced

Cell cycle chapter broken in BioC 3.18

Closed this issue · 2 comments

@lgeistlinger I think this has been broken for a while: https://www.bioconductor.org/checkResults/3.18/books-LATEST/OSCA.advanced/nebbiolo2-buildsrc.html.
I can reproduce the error on my Ubuntu laptop running BioC 3.18 (session info below).
Stepping through inst/book/cell-cycle.Rmd, I find a few differences in the output to what is currently published to https://bioconductor.org/books/3.18/OSCA.advanced/cell-cycle-assignment.html.
Not all differences results in errors.

Hope you can decide on a fix and that my suggestions are helpful.
I'm away at a conference from today and until the week after next.


Difference 1

This does not result in an error in the build.

Part A

Code chunk

```{r}
# Switching row names back to Ensembl to match the reference.
test.data <- logcounts(sce.416b)
rownames(test.data) <- rowData(sce.416b)$ENSEMBL
library(SingleR)
assignments <- SingleR(test.data, ref=sce.ref, label=sce.ref$phase,
de.method="wilcox", restrict=cycle.anno)
tab <- table(assignments$labels, colLabels(sce.416b))
tab
```

Current published output

##      
##        1  2  3  4
##   G1  66  6 15  1
##   G2M  2 61  2 13
##   S   10  2  7  0

Output on my machine

       1  2  3  4
  G1  66  5 13  0
  G2M  1 64  3 14
  S   11  0  8  0

Hypothesis

Cells per cluster are identical, so I suspect changes to SingleR are responsible for this different output.

Summary

Small changes in the SingleR output are not affecting the summary in the text, so it's probably fine just to use the new results.
You might want to add a sanity check that the code results continue to match the text.

Part B

Code chunk

```{r}
# Test for differences in phase distributions between clusters 1 and 2.
chisq.test(tab[,1:2])
```

Current published output

## 
##  Pearson's Chi-squared test
## 
## data:  tab[, 1:2]
## X-squared = 110, df = 2, p-value <2e-16

Output on my machine


	Pearson's Chi-squared test

data:  tab[, 1:2]
X-squared = 124, df = 2, p-value <2e-16

Hypothesis

This is a consequence of Part A.


Difference 2

This is the error that causes the build to fail.

Code chunk

```{r, echo=FALSE, results="hide"}
tab <- table(assignments$phases, colLabels(sce.416b))
stopifnot(tab["G1",1] > 0.5 * sum(tab[,1]))
stopifnot(tab["G2M",2] > 0.5 * sum(tab[,2]))
stopifnot(tab["G1",3] > 0.5 * sum(tab[,3]))
library(bluster)
rand <- pairwiseRand(singler.assignments$labels, assignments$phases, mode="index")
stopifnot(rand > 0.6)
```

Current published output

The results of this code chunk are hidden, but the key point is that it used to be true that rand > 0.6.

Output on my machine

> stopifnot(rand > 0.6)
Error: rand > 0.6 is not TRUE
> rand
[1] 0.5792

Hypothesis

cyclone-assigned cell cycle phases appear identical, based on:

table(assignments$phases, colLabels(sce.416b))

giving identical results on my machine to the published version, so I again suspect changes to SingleR are responsible for this different output.

Summary

Small changes in the SingleR output are giving slightly different Rand indices, but this doesn't really affect the summary in the main text, so it's probably fine just to use the new results.
You will need to modify the existing sanity check to allow for the new value of rand.


Difference 3

This does not result in an error in the build.

Code chunk

```{r cell-cycle-contrastive, fig.width=10, fig.height=10, fig.cap="$t$-SNE plots for the 416B dataset before and after contrastive PCA. Each point is a cell and is colored according to its inferred cell cycle phase (left) or oncogene induction status (right)."}
top.hvgs <- getTopHVGs(dec.416b, p=0.1)
wild <- sce.416b$phenotype=="wild type phenotype"
set.seed(100)
library(scPCA)
con.out <- scPCA(
target=t(logcounts(sce.416b)[top.hvgs,]),
background=t(logcounts(sce.416b)[top.hvgs,wild]),
penalties=0, n_eigen=10, contrasts=100)
# Visualizing the results in a t-SNE.
sce.con <- sce.416b
reducedDim(sce.con, "cPCA") <- con.out$x
sce.con <- runTSNE(sce.con, dimred="cPCA")
# Making the labels easier to read.
relabel <- c("onco", "WT")[factor(sce.416b$phenotype)]
scaled <- scale_color_manual(values=c(onco="red", WT="black"))
gridExtra::grid.arrange(
plotTSNE(sce.416b, colour_by=I(assignments$phases)) + ggtitle("Before (416b)"),
plotTSNE(sce.416b, colour_by=I(relabel)) + scaled,
plotTSNE(sce.con, colour_by=I(assignments$phases)) + ggtitle("After (416b)"),
plotTSNE(sce.con, colour_by=I(relabel)) + scaled,
ncol=2
)
```

Current published output

image

Output on my machine

image

Hypothesis

Although a seed for random number generation is set in this code chunk and prior chunks that it depends on, the vagaries of (potentially stochastic) and numerical precision of PCA/cPCA/t-SNE and/or small changes to code underlying these functions may have lead to the small differences in the t-SNE plots (beyond the simple mirroring

Summary

Small changes in code underlying the PCA/cPCA/t-SNE have resulted in changes to the t-SNE, but this doesn't really affect the summary in the main text, so it's probably fine just to use the new results.


Session info

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

time zone: Australia/Melbourne
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] scPCA_1.15.1                BiocSingular_1.17.1        
 [3] ResidualMatrix_1.11.0       batchelor_1.17.2           
 [5] bluster_1.11.4              SingleR_2.3.7              
 [7] org.Mm.eg.db_3.18.0         ensembldb_2.25.1           
 [9] AnnotationFilter_1.25.0     GenomicFeatures_1.53.3     
[11] AnnotationDbi_1.63.2        Biobase_2.61.0             
[13] GenomeInfoDb_1.37.7         BiocGenerics_0.47.1        
[15] scRNAseq_2.15.0             scran_1.29.3               
[17] scater_1.29.4               ggplot2_3.4.4              
[19] scuttle_1.11.3              SummarizedExperiment_1.31.1
[21] SingleCellExperiment_1.23.0 IRanges_2.35.3             
[23] GenomicRanges_1.53.3        S4Vectors_0.39.3           
[25] BiocStyle_2.29.2            rebook_1.11.1              

loaded via a namespace (and not attached):
  [1] later_1.3.1                   BiocIO_1.11.0                
  [3] bitops_1.0-7                  filelock_1.0.2               
  [5] tibble_3.2.1                  CodeDepends_0.6.5            
  [7] graph_1.79.4                  XML_3.99-0.14                
  [9] lifecycle_1.0.3               Rdpack_2.5                   
 [11] edgeR_3.99.6                  globals_0.16.2               
 [13] lattice_0.22-4                magrittr_2.0.3               
 [15] limma_3.57.11                 sass_0.4.7                   
 [17] rmarkdown_2.25                jquerylib_0.1.4              
 [19] yaml_2.3.7                    metapod_1.9.0                
 [21] httpuv_1.6.12                 cowplot_1.1.1                
 [23] DBI_1.1.3                     RColorBrewer_1.1-3           
 [25] abind_1.4-5                   zlibbioc_1.47.0              
 [27] Rtsne_0.16                    purrr_1.0.2                  
 [29] RCurl_1.98-1.12               rappdirs_0.3.3               
 [31] GenomeInfoDbData_1.2.11       ggrepel_0.9.4                
 [33] irlba_2.3.5.1                 listenv_0.9.0                
 [35] pheatmap_1.0.12               RSpectra_0.16-1              
 [37] parallelly_1.36.0             dqrng_0.3.1                  
 [39] DelayedMatrixStats_1.23.9     codetools_0.2-19             
 [41] DelayedArray_0.27.10          xml2_1.3.5                   
 [43] tidyselect_1.2.0              farver_2.1.1                 
 [45] ScaledMatrix_1.9.1            viridis_0.6.4                
 [47] matrixStats_1.0.0             BiocFileCache_2.9.1          
 [49] GenomicAlignments_1.37.0      jsonlite_1.8.7               
 [51] BiocNeighbors_1.19.0          ellipsis_0.3.2               
 [53] tools_4.3.1                   progress_1.2.2               
 [55] Rcpp_1.0.11                   glue_1.6.2                   
 [57] gridExtra_2.3                 SparseArray_1.1.12           
 [59] xfun_0.40                     MatrixGenerics_1.13.2        
 [61] dplyr_1.1.3                   withr_2.5.1                  
 [63] BiocManager_1.30.22           fastmap_1.1.1                
 [65] fansi_1.0.5                   digest_0.6.33                
 [67] rsvd_1.0.5                    R6_2.5.1                     
 [69] mime_0.12                     colorspace_2.1-0             
 [71] biomaRt_2.57.1                RSQLite_2.3.1                
 [73] utf8_1.2.4                    generics_0.1.3               
 [75] data.table_1.14.8             rtracklayer_1.61.2           
 [77] prettyunits_1.2.0             httr_1.4.7                   
 [79] S4Arrays_1.1.6                pkgconfig_2.0.3              
 [81] gtable_0.3.4                  blob_1.2.4                   
 [83] XVector_0.41.2                htmltools_0.5.6.1            
 [85] bookdown_0.36                 ProtGenerics_1.33.1          
 [87] scales_1.2.1                  png_0.1-8                    
 [89] knitr_1.44                    rstudioapi_0.15.0            
 [91] rjson_0.2.21                  curl_5.1.0                   
 [93] cachem_1.0.8                  stringr_1.5.0                
 [95] BiocVersion_3.18.0            parallel_4.3.1               
 [97] vipor_0.4.5                   restfulr_0.0.15              
 [99] pillar_1.9.0                  grid_4.3.1                   
[101] vctrs_0.6.4                   promises_1.2.1               
[103] origami_1.0.7                 dbplyr_2.3.4                 
[105] beachmat_2.17.17              xtable_1.8-4                 
[107] cluster_2.1.4                 beeswarm_0.4.0               
[109] evaluate_0.22                 cli_3.6.1                    
[111] locfit_1.5-9.8                compiler_4.3.1               
[113] Rsamtools_2.17.0              rlang_1.1.1                  
[115] crayon_1.5.2                  future.apply_1.11.0          
[117] labeling_0.4.3                ggbeeswarm_0.7.2             
[119] stringi_1.7.12                viridisLite_0.4.2            
[121] BiocParallel_1.35.4           assertthat_0.2.1             
[123] munsell_0.5.0                 Biostrings_2.69.2            
[125] coop_0.6-3                    lazyeval_0.2.2               
[127] Matrix_1.6-1.1                dir.expiry_1.9.0             
[129] ExperimentHub_2.9.1           hms_1.1.3                    
[131] future_1.33.0                 sparseMatrixStats_1.13.4     
[133] bit64_4.0.5                   KEGGREST_1.41.4              
[135] statmod_1.5.0                 shiny_1.7.5.1                
[137] interactiveDisplayBase_1.39.0 AnnotationHub_3.9.2          
[139] kernlab_0.9-32                rbibutils_2.2.15             
[141] igraph_1.5.1                  memoise_2.0.1                
[143] bslib_0.5.1                   bit_4.0.5     

Thanks for the comprehensive report @PeteHaitch. I look into that.

fixed in 64f389c.