Evaluating genomic offset predictions in a forest tree with high population genetic structure

Juliette Archambeau1,2, Marta Benito-Garzón1, Marina de-Miguel1,3, Alexandre Changenet1, Francesca Bagnoli4, Frédéric Barraquand5, Maurizio Marchi4, Giovanni G. Vendramin4, Stephen Cavers2, Annika Perry2 and Santiago C. González-Martínez1

1 INRAE, Univ. Bordeaux, BIOGECO, F-33610 Cestas, France

2 UK Centre for Ecology & Hydrology, Bush Estate, Penicuik, United Kingdom

3 EGFV, Univ. Bordeaux, Bordeaux Sciences Agro, INRAE, ISVV, F-33882, Villenave d'Ornon, France

4 Institute of Biosciences and BioResources, National Research Council, 50019 Sesto Fiorentino, Italy

5 CNRS, Institute of Mathematics of Bordeaux, F-33400 Talence, France


This repository contains the scripts needed to reproduce the analyses in Archambeau et al. (2024).

Paper abstract

Predicting how tree populations will respond to climate change is an urgent societal concern. An increasingly popular way to make such predictions is the genomic offset (GO) approach, which uses current gene-environment associations to identify populations that may experience climate maladaptation in the near future. However, GO has strong limitations and, despite promising validation of its predictions using height data from common gardens, it still lacks broad empirical testing. Using maritime pine, a tree species from southwestern Europe and North Africa with a marked population genetic structure, we evaluated GO predictions from four methods, namely Gradient Forest (GF), Redundancy Analysis (RDA), latent factor mixed models (LFMM) and Generalised Dissimilarity Modeling (GDM). GO was predicted using 9,817 SNPs genotyped on 454 trees from 34 populations and was then validated with mortality data from National Forest Inventories and mortality and height data from five common gardens. We found high variability in GO predictions and validation. GO predictions with GDM and GF (and to a lesser extent RDA) based on candidate SNPs potentially involved in climate adaptation showed the strongest and most consistent associations with mortality rates in common gardens and NFI plots. We found almost no association between GO predictions and tree height in common gardens, most likely due to the overwhelming effect of population genetic structure on tree height. Our study demonstrates the imperative to validate GO predictions with a range of independent data sources before they can be used as informative and reliable metrics in conservation or management strategies.


REPORTS

The code (.qmd and Rmd files) used to generate the following html reports are in the folder /reports.

  • Checking population information (coordinates and elevation data) from different sources (e.g. collected from different studies).
  • Extracting climatic data from Climate Downscaling Tool (ClimateDT) at the location of the populations.
  • Calculating the average of the climatic variables across the time periods of interest.
  • Formatting genomic data: converting letters (e.g. A/A, A/G) to numbers (0,1 or 2), and --- to NA.
  • Filtering genomic data for monomorphic SNPs, minor allele counts (MAC), proportion of missing data per clone and per SNP, minor allele frequencies (MAF).
  • Estimating statistical correlations among SNPs and LD.
  • Determining SNPs position on the genome.
  • Exploring genomic data, e.g., number of SNPs/clones genotyped in each assay, Average and maximum number of missing values per clone.
  • Imputation of missing data.
  • Extracting climatic data from ClimateDT at the location of the common gardens .
  • Calculating the mean climate in each common garden between the planting date and the measurement date.
  • Comparing ClimateDT climatic data from point estimates (generated using scale-free downscaling) and extracted values from rasters.
  • Comparing the values of the climatic variables at the location of the populations under two different reference periods, i.e., 1901-1950 and 1961-1990.
  • Comparing the values of the climatic variables at the location of the populations under current and future climates (from five GCMs).
  • Selection of the climatic variables based on their biological relevance for maritime pine, their contribution to the genetic variance using a RDA-based stepwise selection (Capblancq and Forester 2021) and the magnitude of their exposure to climate change.
  • Partitioning genomic variation among climate, neutral population genetic structure (accounted for with the main axes of a PCA) and geography (accounted for with population coordinates or distance-based Moran's eigenvector maps).
  • Identification of the outlier SNPs using Redundancy analysis (RDA); approach developed in Capblancq et al. (2018) and Capblancq and Forester (2021).

Some figures generated in this report:

  • Identification of outlier SNPs with the Gradient Forest (GF) algorithm, using either raw allele frequencies (GF-raw) or allele frequencies after correction for population relatedness (GF-X), as described in Fitzpatrick et al. 2021 and Capblancq et al. 2023. Note that only outlier SNPs identified with GF-raw were used to select the potential candidate SNPs for adaptation to climate, which were then used to calculate the genomic offset.
  • Identifying the common outlier SNPs across the different gene-environment association (GEA) methods.
  • Checking the genome position of the outlier SNPs; when some of them were located on the same scaffold/contig, only the SNP with the lower $p$-value in the RDA was kept in the final set of candidate SNPs.
  • Generating a set of control SNPs (with the same number of SNPs as in the set of candidate SNPs).

Some figures generated in this report:

  • GFplots_cand.pdf GF plots for the set of candidate SNPs identified by at least two GEA methods among RDA, pRDA, GF, LFMM and BayPass.
  • GFplots_cand_corrected.pdf GF plots for the set of candidate SNPs identified by at least two GEA methods correcting for population structure (i.e pRDA, LFMM, BayPass).
  • GFplots_control.pdf GF plots for the set of control SNPs.
  • Comparing genomic offset predictions across the different methods (GF, GDM, LFMM and RDA), SNPs sets (control and candidate SNPs; and also all SNPs for LFMM) and GCMs.
  • Filtering and exploring mortality data from the National Forest Inventory (NFI) plots of France and Spain; see Changenet et al. (2021).
  • Extracting climatic data at the location of the NFI plots.
  • Calculating the average of the climatic variables for the reference period (1901-1950) and the inventory period (specific to each NFI plot).
  • Estimating the association between the genomic offset predictions and mortality rates in the NFI plots.
  • Building and evaluating the accuracy of the Bayesian models used to estimate the association between genomic offset predictions and mortality rates in the NFI plots.
  • Calculating the climate transfer distances (CTDs), i.e., absolute climatic difference between the location of the populations and the common gardens.
  • Estimating the association between genomic offset predictions / CTDs and mortality and height data from the five clonal common gardens (CLONAPIN network).
  • Comparing the predictive ability of genomic offset predictions vs CTDs.
  • Correlations among GO predictions and climatic transfer distances in each common garden are shown here: Asturias (Spain), Bordeaux (France), Cáceres (Spain), Madrid (Spain), Fundão (Portugal).
  • Generating figures for the Supplementary Information based on the PCA of the control and candidate SNPs: screeplots and PCA plots with the ALT and ARM populations highlighted.

License

The code of this repository is under the MIT license

Disclaimer

The functions below come from other sources and may therefore only be reused under the licenses indicated by their authors:

Software versions

Analyses were undertaken with R version 4.3.3. R package versions are shown at the end of each report and in RPackageCitations.html.

References

Archambeau J, Benito Garzón M, Barraquand F, de Miguel M, Plomion C and González-Martı́nez SC (2022). Combining climatic and genomic data improves range-wide tree height growth prediction in a forest tree. The American Naturalist 200(4):E141–E159.

Capblancq T, Luu K, Blum MGB and Bazin E (2018). Evaluation of redundancy analysis to identify signatures of local adaptation. Molecular Ecology Resources 18(6):1223–1233.

Capblancq T and Forester BR (2021). Redundancy analysis: A Swiss army knife for landscape genomics. Methods in Ecology and Evolution 12(12):2298–2309.

Capblancq T, Lachmuth S, Fitzpatrick MC and Keller SR (2023). From common gardens to candidate genes: exploring local adaptation to climate in red spruce. New Phytologist 237(5):1590–1605.

Caye K, Jumentier B, Lepeule J and François O (2019). LFMM 2: Fast and accurate inference of gene-environment associations in genome-wide studies. Molecular Biology and Evolution 36(4):852–860.

Changenet A, Ruiz-Benito P, Ratcliffe S, Fréjaville T, Archambeau J, Porte AJ et al. (2021). Occurrence but not intensity of mortality rises towards the climatic trailing edge of tree species ranges in European forests. Global Ecology and Biogeography 30(7):1356–1374.

Fitzpatrick MC and Keller SR (2015). Ecological genomics meets community-level modelling of biodiversity: mapping the genomic landscape of current and future environmental adaptation. Ecology Letters 18(1):1–16.

Fitzpatrick MC, Chhatre VE, Soolanayakanahally RY and Keller SR (2021). Experimental support for genomic prediction of climate maladaptation using the machine learning approach Gradient Forests. Molecular Ecology Resources 21(8):2749–2765.

Forester BR, Lasky JR, Wagner HH, Urban DL (2018). Comparing methods for detecting multilocus adaptation with multivariate genotype-environment associations. Molecular Ecology 27(9):2215-2233.

Frichot E, Schoville SD, Bouchard G and François O (2013). Testing for associations between loci and environmental gradients using latent factor mixed models. Molecular Biology and Evolution 30(7):1687–1699.

Gain C and François O (2021). LEA 3: Factor models in population genetics and ecological genomics with R. Molecular Ecology Resources 21(8):2738–2748.

Gain C, Rhoné B, Cubry P, Salazar I, Forbes F, Vigouroux Y et al. (2023). A quantitative theory for genomic offset statistics. Molecular Biology and Evolution 40(6):msad140.

Gautier M (2015). Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics 201(4):1555–1579.

Gougherty AV, Keller SR and Fitzpatrick MC (2021). Maladaptation, migration and extirpation fuel climate change risk in a forest tree species. Nature Climate Change 11:166–171.

Jaramillo-Correa JP, Rodrı́guez-Quilón I, Grivet D, Lepoittevin C, Sebastiani F, Heuertz M et al. (2015). Molecular proxies for climate maladaptation in a long-lived tree (Pinus pinaster Aiton, Pinaceae). Genetics 199(3):793–807.

Mokany K, Ware C, Woolley SN, Ferrier S and Fitzpatrick MC (2022). A working guide to harnessing generalized dissimilarity modelling for biodiversity analysis and conservation assessment. Global Ecology and Biogeography 31(4):802-821.