An R Package for the RHL30 prognostic predictor. The predictor is a gene expression-based prognostic model for predicting post-autologous stem-cell transplantation outcomes. It designed to be used on RHL30 NanoString expression count data on relapsed Hodgkin lymphoma (RHL) samples.
The predictor was published at:
Chan FC*, Mottok A*, et al. Prognostic Model to Predict Post-Autologous Stem-Cell Transplantation Outcomes in Classical Hodgkin Lymphoma. J Clin Oncol JCO2017727925 (2017) doi:10.1200/JCO.2017.72.7925. *Contributed equally to this work.
To install this package, you need to first have the package devtools
installed, then you run:
devtools::install_github("tinyheero/RHL30")
We will be using the BCCA RHL30 training cohort from the paper as an example of how to generate RHL30 predictor score. The following steps will reproduce the RHL30 scores from the paper.
First, let’s load the RHL30 package and the RHL30 model:
library("RHL30")
library("dplyr")
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
rhl30_model_df <- get_rhl30_model_coef_df()
rhl30_model_df
#> # A tibble: 30 x 4
#> gene_name refseq_mrna_id gene_type coefficient
#> <chr> <chr> <chr> <dbl>
#> 1 ACTB NM_001101.2 housekeeper NA
#> 2 ALAS1 NM_000688.4 housekeeper NA
#> 3 CLTC NM_004859.2 housekeeper NA
#> 4 GAPDH NM_002046.3 housekeeper NA
#> 5 GUSB NM_000181.1 housekeeper NA
#> 6 PGK1 NM_000291.2 housekeeper NA
#> 7 POLR2A NM_000937.2 housekeeper NA
#> 8 RPL19 NM_000981.3 housekeeper NA
#> 9 RPLP0 NM_001002.3 housekeeper NA
#> 10 SDHA NM_004168.1 housekeeper NA
#> # … with 20 more rows
The model contains a total of 30 genes:
- 18 genes that make the model
- 12 housekeeper genes that are used to normalize the data
The next step is to load the expression data you want to generate RHL30
scores on. The expression data should be a tab-separated values file.
The first line should be a header line with gene_name
as the first
column followed by the sample identifiers. Each row should then be the
name of the gene and then the respectively raw expression values for
each sample.
The expression data of the BCCA RHL30 training cohort is provided as an example. Let’s load that data:
exprs_file <-
system.file("extdata", "bcca_rhl_rhl30_gene_exprs_mat.tsv", package = "RHL30")
exprs_mat <- load_exprs_mat(exprs_file)
dim(exprs_mat)
#> [1] 30 68
The expression data contains the 30 genes (rows) and 68 samples (columns). Next we calculate the normalizer values (geometric mean of the 12 housekeepers) for each sample:
hk_genes <-
filter(rhl30_model_df, gene_type == "housekeeper") %>%
pull("gene_name")
sample_normalizer_values <- get_sample_normalizer_value(exprs_mat, hk_genes)
#> [get_normalizer]: Generating the geometric mean of housekeeper genes
In the paper, a threshold of 35 was set to exclude poor quality samples. This was done because very low normalizer values often lead to very high normalized expression values. We can apply this threshold to eliminate poor quality samples:
high_quality_samples <-
names(sample_normalizer_values[sample_normalizer_values > 35])
filtered_exprs_mat <- exprs_mat[, high_quality_samples]
dim(filtered_exprs_mat)
#> [1] 30 66
This eliminates 2 poor quality samples leaving us with 66 samples. Note that the sample HL1120 did not receive ASCT and thus was not reported in figure 4 of the paper. As such, the final number in figure 4 is 65 samples.
Let’s normalize our expression matrix and generate the RHL30 scores for each sample:
filtered_exprs_mat_norm <-
normalize_exprs_mat(filtered_exprs_mat, sample_normalizer_values)
#> [normalize_exprs_mat]: Normalizing the expression exprs_matrix
#> [normalize_exprs_mat]: Log2 transforming
rhl30_df <- get_rhl30_scores_df(filtered_exprs_mat_norm, rhl30_model_df)
head(rhl30_df)
#> # A tibble: 6 x 2
#> sample_id score
#> <chr> <dbl>
#> 1 HL1013 10.3
#> 2 HL1014 10.5
#> 3 HL1015 9.77
#> 4 HL1017 9.80
#> 5 HL1018 11.3
#> 6 HL1019 9.70
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.2 (2019-12-12)
#> os macOS Sierra 10.12.6
#> system x86_64, darwin16.7.0
#> ui unknown
#> language (EN)
#> collate en_GB.UTF-8
#> ctype en_GB.UTF-8
#> tz Europe/London
#> date 2020-02-29
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.2)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.2)
#> callr 3.4.2 2020-02-12 [1] CRAN (R 3.6.2)
#> cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.2)
#> devtools 2.2.2 2020-02-17 [1] CRAN (R 3.6.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.2)
#> dplyr * 0.8.4 2020-01-31 [1] CRAN (R 3.6.2)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.2)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.2)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.2)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.2)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.2)
#> knitr * 1.28 2020-02-06 [1] CRAN (R 3.6.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.2)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.2)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.2)
#> ps 1.3.2 2020-02-13 [1] CRAN (R 3.6.2)
#> purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.2)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.2)
#> readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.2)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 3.6.2)
#> RHL30 * 0.0.0.9000 2020-02-29 [1] Github (tinyheero/RHL30@2d9e3bc)
#> rlang 0.4.4 2020-01-28 [1] CRAN (R 3.6.2)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.2)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.2)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.2)
#> tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.2)
#> tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.2)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.2)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.2)
#> vctrs 0.2.3 2020-02-20 [1] CRAN (R 3.6.2)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.2)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2)
#>
#> [1] /usr/local/Cellar/r/3.6.2/lib/R/library