/tcga_rebuttal

Re-analysis of data provided by Gihawi et al. 2023 bioRxiv

Primary LanguageR

Background information

This Github repository is a succint response to the second preprint posted by Gihawi et al. on July 31, 2023 regarding our 2020 Nature paper, and it only addresses whether their data tables show cancer type-specific signals.

Approximately 5 months ago, we previously posted our extensive rebuttal of the Gihawi and colleagues' first preprint, which contains many of the same topics as their second preprint. We further note that many of their concerns were concurrently addressed in our peer-reviewed 2022 Cell paper, which employed a wide variety of updated methods (e.g., newer microbial reference database, alignment methods, host depletion, machine learning, de novo metagenome assembly, etc.) compared to the 2020 paper while making the same conclusion: microbiomes in tumors and blood are cancer type-specific.

Outline of re-analyses

Below, we perform the following analyses on data provided by Gihawi et al. 2023 bioRxiv using consecutively stricter filtering to show that cancer type-specific signals persist:

  1. Using all raw data
  2. Using only genera intersecting with the Weizmann Institute of Science (WIS) samples (n=149 genera)
  3. Using nine (9) "well-known" genera that were also found in the WIS samples

(1) Re-analysis of all raw data show cancer type-specific microbiomes

We asked whether their raw data on 3 TCGA cancer types (bladder [BLCA], breast [BRCA], head and neck [HNSC]) support cancer type-specific microbiomes in tumors and blood, which was the central conclusion of our 2020 paper. For reference, their data were downloaded from the Github link (https://github.com/yge15/Cancer_Microbiome_Reanalyzed) listed in their Methods section.

Briefly, the following was done:

  • Their data were loaded alongside metadata from the original Poore et al. 2020 Nature paper
  • The 3 cancer type tables containing raw counts were merged. Conservatively, we retained only overlapping genera between the 3 tables. However, we removed all counts associated with the Homo genus (i.e., human data) to focus just on the microbial differences. As noted by Gihawi et al., their database for KrakenUniq only contained complete genomes of microbes.
  • We separated the count and metadata into subsets with a single sequencing center (e.g., Harvard Medical School [HMS]), sequencing platform (Illumina HiSeq), and experimental strategy (WGS) in order to avoid needing to apply batch correction. All subsequent analyses were based on these raw data subsets without batch correction.
  • We applied beta and alpha diversity analyses to these subsets using Qiime 2 (see code within the Qiime folder). To avoid rarefaction for the beta diversity analyses, we employed a robust Aitchison approach called RPCA (Martino et al. 2019). We also applied more traditional beta-diversity measures (e.g., Bray-Curtis, Jaccard) and calculated alpha diversity using data rarefied to the 1st quartile of sample read counts for each subset.
  • After finding that the cancer type separation was significant, we performed multi-class and two-class machine learning within HMS primary tumor (PT) samples and HMS blood derived normal (BDN) samples.

The figure summarizing the findings and its associated caption is below, demonstrating the raw data still provide cancer type specificity (Figure 1). All analyses necessary to generate these plots have code contained within this repository or can be done without code using https://view.qiime2.org/.

Alt text

Figure 1: Raw data provided by Gihawi et al. 2023 bioRxiv reveal cancer type-specific microbiomes.
(A) The breakdown of samples derived from TCGA’s Harvard Medical School (HMS), which contributed the most samples of any sequencing center among their samples (284 of 728, 39.0%). To avoid the need for batch correction, samples were analyzed using raw data from Gihawi et al. 2023 in subsets from a single sequencing center (here, HMS), sequencing platform (Illumina HiSeq), and experimental strategy (WGS). The data were not transformed in any way other than removing the Homo genus counts since it was included in their KrakenUniq mappings.
(B) Aitchison-based Principal Coordinates Analysis (RPCA) on the raw data, excluding Homo counts, across all primary tumors (PTs) from HMS. RPCA does not require rarefaction and none was used. PERMANOVA with 999 iterations was run to estimate effect sizes of cancer type separation, which was significant (inset values, p=0.001). Colors reflect cancer types and are uniform across all subpanels of this figure.
(C) After RPCA, Shannon entropies were calculated on all PT samples from HMS using raw data (excluding Homo counts) rarefied to the first quartile of sample counts (600 reads/sample). A Kruskal-Wallis test showed significant differences in Shannon diversity among cancer types (p=1.4e-9).
(D) RPCA was then applied to all blood derived normal (BDN) samples from HMS using raw data (excluding Homo counts). RPCA does not require rarefaction and none was used. PERMANOVA with 999 iterations was run to estimate effect sizes of cancer type separation, which was significant (inset values, p=0.001).
(E) After RPCA, Shannon entropies were calculated on all BDN samples from HMS using raw data (excluding Homo counts) rarefied to the first quartile of sample counts (400 reads/sample). A Kruskal-Wallis test showed significant differences in Shannon diversity among cancer types (p=3.7e-7).
(F) Having found cancer type-specific differences, we evaluated if multiclass machine learning could discriminate between cancer types using the raw data (excluding Homo counts) from all HMS PT samples. Gradient boosting machines were applied with 10-fold cross-validation, and their predictions were used to generate a confusion matrix. The mean balanced accuracy was 97.59% in comparison to the no information rate (NIR) of 54.84% (p<2.2e-16). (G) 10-fold cross-validation using gradient boosting machines on HMS BDN samples. The balanced accuracy was 92.8% in comparison to the NIR of 80% (p=4.4e-5).

(2) Subsetting the raw data to known tumor-resident bacteria and fungi from the Weizmann Institute still show cancer type-specific microbiomes

Above, we considered the raw data as-is from Gihawi and colleagues. We next evaluated whether the predictive modeling results would be replicated if subsetting the microbial genera to only those identified in a highly-decontaminated set of independent tumors processed by the Weizmann Institute of Science (WIS) (see Nejman et al. 2020 Science, Narunsky-Haziza et al. 2022 Cell). We note these two studies together ran >1100 experimental contamination controls in parallel to the tumor samples, thereby providing strong support for the taxa that persisted. For further background, we also refer to the answer in this Github comment. After importing the 'hit list' of bacterial and fungal genera and intersecting these genera with the raw data from Gihawi and colleagues (n=149 remaining genera, 17.6% of total), we then repeated the same machine learning analyses as done above, indeed finding similarly strong cancer type discrimination (Figure 2). Thus, using highly-decontaminated microbial genera from an independent set of tumors, we find that the raw data from Gihawi and colleagues still shows cancer type-specificity in TCGA.

The figure summarizing the findings and its associated caption is below. The code necessary to generate these plots is contained within tcga_gihawhi_rebuttal_WIS_subset_3Aug23.R.

image
Figure 2: Known tumor-resident genera from the Weizmann Institute of Science (WIS) in data from Gihawi et al. 2023 bioRxiv reveal cancer type-specific microbiomes in TCGA tumors and blood.
(A) After subsetting to WIS-overlapping genera (n=149 genera), we evaluated if multiclass machine learning could discriminate between cancer types using the raw data from all HMS PT samples. Gradient boosting machines were applied with 10-fold cross-validation, and their predictions were used to generate a confusion matrix. The mean balanced accuracy was 93.62% in comparison to the no information rate (NIR) of 54.84% (p<2.2e-16).
(B) After subsetting to WIS-overlapping genera (n=149 genera), 10-fold cross-validation using gradient boosting machines was applied on HMS BDN samples. The balanced accuracy was 88.82% in comparison to the NIR of 80% (p=4.4e-5).

(3) A minimal set of nine “well-known” genera show cancer type specificity in tumors and blood within the raw data

Above, we showed how cancer type specificity is shown using (i) the raw data as-is from Gihawi and colleagues or (ii) when their genera were intersected with those from the Weizmann Institute of Science (WIS)’s tumors. To provide further evidence that cancer type-specific microbiomes are not due to contamination or overfitting, we selected a minimal set of 9 bacterial genera in the Gihawi et al. data that are commonly associated with human health and disease: Pseudomonas, Staphylococcus, Clostridium, Acinetobacter, Streptococcus, Klebsiella, Prevotella, Fusobacterium, and Veillonella. All of these genera were found in WIS tumors and in our 2020 paper. Notably, we find that this minimal set of 9 genera is sufficient to reproduce the basic conclusions of our 2020 paper using the Gihawi et al. data: tumor- and blood-derived microbiomes are cancer type specific.

The prevalence of these 9 genera among primary tumor (PT) samples from Harvard Medical School (HMS)—the largest sequencing center in the data—ranged from 58.06% to 100%, and in blood (BDN) samples from 30.53% to 99.35% (Figure 3A). Since HMS sequenced samples from all three cancer types, we correlated the PT and BDN prevalences, finding significant similarities (Pearson R=0.97, p=1.4e-5, Figure 3B) despite known sample processing differences in TCGA between blood and tumors. This prevalence similarity is consistent with the idea that tumor and blood microbiomes are related, as proposed by us and others (e.g., Narunsky-Haziza et al. 2022 Cell; Dohlman et al. 2022 Cell).

We next asked whether the raw abundances of these 9 genera were significantly different among these three cancer types. To be statistically valid, we applied a compositionally-coherent method called log-ratio testing (Rivera-Pinto et al. 2018 mSystems; Morton et al. 2019 Nature Comm), wherein a subset of the 9 genera’s abundances are summed in the numerator and the remainder summed in the denominator, followed by a division and applying a log2 transformation. Mathematically, the log-ratio (LR) was calculated per sample as follows:

$LR=\log_2\left(\frac{Prevotella + Fusobacterium + Streptococcus + Veillonella + Klebsiella + Clostridium + Pseudomonas}{Staphylococcus + Acinetobacter}\right)$

Notably, the LRs alone were sufficient to significantly separate primary tumor-based cancer types among every sequencing center included in the Gihawi et al. data that had >1 cancer type for comparison (Figure 3C-E). In the case of HMS, which had all 3 cancer types, significant stepwise increases were observed from BLCA to BRCA to HNSC (Figure 3C). Thus, straightforward statistical analysis of LRs shows cancer type separation using the minimal set of 9 genera.

We next calculated Aitchison-based beta diversity using RPCA (Martino et al. 2019 mSystems), followed by PERMANOVA testing, again finding that every sequencing center subset had cancer type-specific differences (see Qiime_RT/Deicode_outputs folder). Primary tumor-based comparisons had F-statistics of 23.4 at HMS (p=0.001), 31.9 at MD Anderson (MDA, p=0.001), and 8.1 at the Broad Institute (“Broad”, p=0.001). Blood-based cancer type beta diversity clustering was weaker but still significant (HMS, F=3.1, p=0.045).

Next, we evaluated machine learning (ML) classifiers derived using the 9 genera’s raw abundances in each sequencing center subset of primary tumors (Figure 3F), finding that they all exceeded areas under the ROC (AUROCs) curve of 92.9%. Since HMS included all 3 cancer types, we performed multiclass ML, finding a mean balanced accuracy of 91.11% (Figure 3G), and then calculated ROC curves in a 1-versus-all-others manner (Figure 3F). MDA and the Broad Institute each contained 2 cancer types, with balanced accuracies discriminating them of 91.24% and 85.69%, respectively (Figure 3H). All of these accuracies were significantly better than the no information rate (all p≤2.84e-3) using a standard 50% probability cutoff.

Having found primary tumor-based cancer type specificity, we next performed ML analyses and log-ratio testing on the blood samples from HMS, which is the only sequencing center containing blood from >1 cancer type. The 9 genera’s raw abundances provided an AUROC of 90.0% (Figure 3I), with a balanced accuracy of 85.53% that was significantly better than the no information rate (all p=0.01) using a standard 50% probability cutoff (Figure 3I, inset confusion matrix). We then calculated the HMS blood sample log-ratios using the same LR equation above, finding again significant separation between breast (BRCA) and head and neck (HNSC) cancer types (p=1.13e-6; Figure 3J). The relative orientation between BRCA and HNSC was also maintained between PT and BDN samples, with comparatively higher log-ratios in HNSC samples (Figure 3C,3J).

Collectively, these data provide evidence that a minimal set of 9 “well-known,” human-associated genera, using data provided by Gihawi et al., replicate the central conclusions from our 2020 paper using statistical methods, metagenomic analyses, and machine learning. All of the code replicating these analyses and figures are saved in tcga_gihawhi_rebuttal_9taxa_subset_3Aug23.R.

assembled-figure-9genera

Figure 3: Cancer type specificity in tumors and blood is achievable using nine (9) “well-known” genera in data from Gihawi et al. 2023 bioRxiv that are also found in the Weizmann Institute of Science (WIS) cohort of tumors.
(A) Nine WIS-overlapping bacterial genera were selected in the Gihawi et al. raw data that are commonly associated with human health and disease: Pseudomonas, Staphylococcus, Clostridium, Acinetobacter, Streptococcus, Klebsiella, Prevotella, Fusobacterium, and Veillonella. Prevalences for these 9 genera were calculated in the Harvard Medical School (HMS) subset of primary tumor (PT) and blood derived normal (BDN) samples since (i) HMS is the largest sequencing center in their data and (ii) since HMS is the only sequencing center with >1 cancer type among both PT and BDN samples. Across all 9 genera, the average PT prevalence was 81.29%, and the average BDN prevalence was 72.63%.
(B) The PT and BDN prevalences in panel (A) were compared, finding significant correlation (Pearson R=0.97, p=1.4e-5). The absolute prevalence is higher in tumors than blood, which is expected based on the relative reduction in microbial biomass.
(C-E) Cancer type differences were then evaluated using log-ratios of the 9 genera raw abundances, which is a compositionally-coherent comparison (Rivera-Pinto et al. 2018 mSystems; Morton et al. 2019 Nature Comm). The following 7 genera abundances were summed in the numerator per sample: Prevotella, Fusobacterium, Streptococcus, Veillonella, Klebsiella, Clostridium, Pseudomonas. The following 2 genera abundances were summed in the denominator per sample: Staphylococcus, Acinetobacter. After summating their abundances, the numerators and denominators were divided, followed by taking the log2 to obtain the per sample log-ratio. This process was done within each sequencing center subset to avoid needing batch correction, including (C) HMS PT samples, (D) MD Anderson PT samples, and (E) the Broad Institute PT samples. No other sequencing center subsets in the Gihawi et al. data had >1 cancer type available to compare. All statistical comparisons used Wilcoxon tests with Holm multiple testing correction in panel (C).
(F) Using the 9 genera’s raw abundances in primary tumors, machine learning (ML) was run within each sequencing center subset to predict cancer type. Since HMS had 3 cancer types, multiclass ML was run, followed by calculating the performance using each class versus all others to obtain 2-class ROC curves (e.g., ”HMS BLCA” denotes the performance of bladder cancer versus breast and head and neck cancers at HMS). MD Anderson (MDA) primary tumors only comprised BLCA and HNSC cancers, and the same was true for Broad Institute (”Broad”) primary tumors. The AUROC was calculated for each 2-class model (inset text) and the ROC curves were overlaid.
(G) Confusion matrix of multiclass ML among the HMS PT samples. The mean balanced accuracy was 91.11%, which was significantly better than the no information rate (NIR) (p<2.2e-16).
(H) Left: Confusion matrix of 2-class ML among the MDA PT samples, wherein the balanced accuracy was 91.24%, which was significantly better than the no information rate (NIR) (p=4.82e-6). Right: Confusion matrix of 2-class ML among the Broad Institute PT samples, wherein the balanced accuracy was 85.69%, which was significantly better than the no information rate (NIR) (p=2.84e-3).
(I) ROC curve of the 2-class ML results among HMS BDN samples, with AUC and the confusion matrix inset. Balanced accuracy was 85.53% and significantly better than the no information rate (NIR) (p=0.01)
(J) Log-ratios using the 9 genera’s raw abundances were calculated among HMS blood samples using the same numerators and denominators as in panels (C-E), with significant separation between cancer types (p=1.13e-6). Wilcoxon testing was used to compare the log-ratios.