This repo contains the data and scripts associated with Rivera et al. 2023, Palau’s warmest reefs harbor thermally tolerant corals that thrive across different habitats; a paper on thermal tolerance and population genetics of Porites cf. lobata corals across the Palauan archipelago.
The repository is separated into a population genetic folder, which contains SNPs, microsatellite, and growth data; and a temperature folder, which contains temperature data and scripts.
Within both folders there is an .RData file that can be loaded directly to avoid some of the initial data processing and begin with data analyses.
All the input data that would be needed to run these scripts is included in the data folder. Intermediate files that can be generated by running the scripts with the included input data are not included (though most of these are in the .RData file for convenience). Please note that the vcf files were uploaded using Git Large File Storage (LFS), if downloading the repository you will need to install git LFS prior to downloading in order to actually retrive the files. See more information about Git LFS here.
Final figures that are in the manuscript and supplementary data are included in the respective figures folders for easy reference (most of these are under the Coral_Data folder).
Between 2011-18, we collected tissue from 543 Porites cf. lobata colonies using a hammer and chisel while on SCUBA. Tissue was preserved in RNAlater™ (Invitrogen, Waltham, MA), incubated overnight at 4°C, and frozen at -20°C (N=329), or frozen directly at -80°C (N=20), or preserved in 95% ethanol and frozen at -20°C (N=194) until DNA extraction. Colonies were sampled haphazardly within each site, across 13 sites (see Sampling Map below), based on morphological characteristics of Porites lobata detailed in Veron (2000).
The site “Ngermid” has been referred to as “Nikko Bay” in previous publications. We use “Ngermid” here as that is the name preferred by Palauan natives. The “Outer Taoch” site contained samples from three outer reef locations: Airai (GPS coordinates: 7.33210, 134.56020, N= 5), Rael Dil (7.24990, 134.45073, N= 3), and a fringing reef (7.27193, 134.38115, N=30) immediately outside of Taoch Bay. The coordinates for this last site were used for mapping because most of the samples in this group are from this location. Due to the presence of various lineages within our dataset, population genetic metrics (e.g., FST) were not calculated by collection site, so this choice should have no bearing on results.
Sampling Map Distribution of Porites cf. lobata lineages across sampling sites. Land is denoted in gray and the reef platform in light pink. Outer reef sites are denoted by blue circles, while Rock Island sites are denoted by red triangles. Donut charts show the distribution of lineages among samples from each site where coral lineages are color coded Dark Blue (DB), Light Blue (LB) Pink (PI), and Red (RD) for visualization. Charts include both RAD-seq and microsatellite data. (For samples with both types of data, RAD-seq derived lineages were used as these predominantly agreed with microsatellite results, see Fig. S2). Total N for each site is shown in the center of each chart. A) Palauan mainland. B) Inset showing Rock Island sites. C) Palau and Helen atoll (Palau’s southernmost territory).
A thawed ~1 mm2 piece of coral was homogenized into a fine powder using a new standard safety razor blade sterilized with ethyl alcohol and flamed. The homogenate was processed using the Qiagen® DNeasy Blood and Tissue DNA extraction kit according to the manufacturer instructions, with a modified Proteinase K incubation of at least 24 hours. Negative controls (N=5) without any coral tissue added were included every 70 samples and subjected to all the same downstream processing and analyses.
We amplified 14 microsatellite markers with fluorescently labeled primers developed for P. cf. lobata by Baums et al. (2013). PCR settings were: (1) initial denaturation at 94°C for 5 min, (2) 35 cycles of 94°C for 20 seconds, annealing at 52, 54, or 56°C (plex-dependent) for 20 seconds, 72°C for 30 seconds, and (3) final extension for 30 minutes at 72°C. The Pennsylvania State University Nucleic Acid Facility measured fragments on an ABI 3730 (GeneScan) with a LIZ-500 internal size standard.
A subset of samples with sufficient quality extracted DNA were then processed for RAD sequencing. Genomic DNA concentrations were standardized using a Qubit™ 2.0 fluorometer (Invitrogen) to 20 ng/ul. A total of 50 ul per sample was sent to Floragenex (Portland, Oregon) for single enzyme RAD library preparation with PstI enzyme digestion. Each sample was identified by a unique 10 nucleotide barcode. Samples (N=185) were sequenced as 100 base pair single end reads across 6 lanes of an Illumina Hiseq 4000™ using v4 chemistry at the University of Oregon Genomics Core facility.
Raw reads were processed using the ‘process_radtags’ module of Stacks v.1.46 (Catchen et al. 2013), allowing for up to three mismatches in the sample barcode (this was the maximum number of mismatches at which the barcodes remained unique). Reads with low quality scores (PHRED<10) across a sliding window of 15% of the read length were discarded. We retained 78% of the original reads. Average sequencing depth was 8.5 million reads per sample. Three samples replicated within the plate showed 1.5-2-fold variability in sequencing depth. One sample which had an unusually high number of reads (>35 million) was discarded.
For read mapping and SNP calling, we used the dDocent v.2.3.7 (Puritz et al. 2014) pipeline with a Porites lutea or P. lutea symbiont draft genome obtained from the REFUGE 2020 database (http://refuge2020.reefgenomics.org/) as reference (for coral and symbionts SNPs, respectively). dDocent clustered reads based on >95% similarity using CD-HIT v.4.6.8, mapped reads to the reference using the MEM algorithm of BWA v.0.7.1768 with a match score of 1, mismatch score of 3, and gap-open penalty of 4, and called SNPs using FreeBayes v.1.1.0 with default values (E=3, m=PHRED 10, q=PHRED10, -V, and using the sampling sites as the populations designations). The resulting ‘TotalRawSNPs.vcf’ file was filtered using vcftools v.0.1.1570 and vcffilter (https://github.com/jameshicks/vcffilter) following the suggestions in the dDocent manual, with a final thinning (-thin option in vcftools to keep only SNPs more than 150 bp apart, e.g. only one SNP per rad tag) to obtain a final set of 12,761 bi-allelic SNPs in 146 retained samples for the coral host and 218 bi-allelic SNPs in 137 individuals for the symbiont. These final vcf files are avaialble in Coral_data/Popgen_R/input_files. RAD-sequencing data are available on NCBI’s SRA under accession number PRJNA801929.
Coral host SNPs and microsatellite data were run through STRUCTURE v.2.3.4 using an admixture model with correlated allele frequencies and default parameters following previously used settings for corals. Including sampling (geographic) information in the prior did not affect results and is not reported. MCMC chain settings were: 1 x 105 burn-in, 1 x 106 iterations from K=1 to K=12, with 10 replicate chains per K. We used the CLUMPAK feature71 on the STRUCTURE Selector webserver to combine and visualize STRUCTURE output through the ‘main pipeline’ option with CLUMPP parameters: LargeKGreedy search, 10,000 random input orders, dynamic MCL, and default minimal cluster size. The Structure Selector webserver was used to run ‘Best K’ metrics which included methods to evaluate the optimal K. Coral samples were assigned to their majority group (>50% ancestry for K=4) as their “final” lineage for further analyses, following convention in other studies. In the three cases where the majority lineage differed between microsatellite STRUCTURE and RADseq assignments, the RADseq assignment was used (Fig. S2). It should be noted that only one of these samples also had core data. The final STRUCTURE assignment outputs are avaialble in Coral_data/Popgen_R/input_files.
The R package adegenet was used to explore genetic structure in both microsatellite and RAD-seq data for both coral and symbionts. Principal component analyses were conducted using the dudi.pca() function on scaled and centered genind/genlight objects. We used the find.clusters() functions to select the optimal number of groups based on Bayesian Information Criterion (BIC). Discriminant analyses of principal component (DAPC) was used to visualize clusters, with the number of principal components retained determined through cross-validation xdapval() to avoid overfitting. For symbiont RAD-seq data, PCAs and DAPC, were colored based on coral host lineage assignments (though these also perfectly matched the identified clusters through DAPC). Nei’s FST between lineages was calculated using the stamppFst() function from the R package STaMPP for RAD-seq data, using 100 bootstraps for estimating significance. For microsatellite data, we used the function genet.dist() function from the package hierfstat. All of the analyses are described in the script: Coral_data/Popgen_R/Popgen_stats_analyses.R.
Coral cores (N=121) were taken using an underwater pneumatic drill equipped with a diamond-tipped drill bit powered by compressed air from a SCUBA tank. Cores ranged from 10 to 204 cm long, covering years from 1970 to 2014. Cores were dried in an oven and imaged using a Volume Zoom Helical Computerized Tomography (CT) Scanner at Woods Hole Oceanographic Institution. Scans were analyzed using an automated computer program developed and described in DeCarlo et al. 2014 and modified by Mollica et al. 2019. Cores which displayed high levels of bioerosion, inconsistent banding patterns, partial mortality or other deformities that did not allow for accurate identification of annual growth bands and as a result correct identification of years corresponding to bleaching events were excluded from analysis (N=42). Of the remaining eighty cores, eleven had unsuccessful DNA extractions or failed other downstream genetic data filters, leaving sixty-nine corals for final analyses.
While differences in growth patterns and stress band prevalence between RI and OR habitats had been previously shown, we wished to test if genetic population groups provided additional explanation of these responses. A colony’s genetic group was assigned as its predominant (>50%) STRUCTURE assigned group for K=4, which was the best K across several methods. The growth patterns and presence/absence of stress bands during the 1998 and 2010 bleaching events (N=44, 5 sites), are data previously described and published in Barkley and Cohen (2016) and DeCarlo et al. (2014). An additional 14 cores were analyzed for this study.
Growth metrics (extension and calcification rates and density) were calculated for each core by averaging the yearly growth during the years 1999-2009, in order to avoid bleaching years that might influence growth patterns. Differences among lineages in growth parameters were tested using ANOVA with post-hoc Tukey tests.
Stress bands were defined as a region of the core at least 1 mm thick in which density and the change in density gradient exceeded two standard deviations above each individual whole core average density and gradient, following the definition and methods in Barkley and Cohen (2016) and DeCarlo et al. (2014). Differences in the proportion of cores showing stress bands between lineages was tested using a Chi-Squared test.
The Coral Reef Research Foundation (http://wtc.coralreefpalau.org/) provided 30-minute interval in situ temperature data for Mecherchar, Helen, Drop Off, Ngerdiluches, Ngerchelong, Ngermid, and Kayangel. These data were recorded using U22 loggers (Onset Technologies, MA). Temperature data covered periods from 2010-2017 and from 2-15 meters depth. The foundation indicated accuracy was determined to be within 0.1°C through pre- and post-deployment calibration against a NIST traceable mercury thermometer and that individual thermographs were also cross calibrated with each other. Temperatures for Risong and Taoch were obtained from U22 loggers (Onset Technologies, MA) deployed between 2-5 meters depth by the Cohen Lab at Woods Hole Oceanographic Institution from 2011-2013 recording at 15-minute intervals.
Statistical metrics of temperature time series for each site were calculated using the ‘zoo’ and ‘xts’ packages in R. To examine daily temperature patterns, each time series was filtered in MATLAB 2015a using a bandpass Butterworth filter to retain signals between 5 and 30 hours in frequency and remove seasonal fluctuations. Daily range (maximum-minimum temperatures) were calculated for each site in R.
- Barkley, H. C. & Cohen, A. L. Skeletal records of community-level bleaching in Porites corals from Palau. Coral Reefs 35, 1407–1417 (2016).
- Baums, I. B., Boulay, J. N., Polato, N. R. & Hellberg, M. E. No gene flow across the Eastern Pacific Barrier in the reef-building coral Porites lobata. Molecular Ecology 21, 5418–5433 (2012).
- Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: An analysis tool set for population genomics. Molecular Ecology 22, 3124–3140 (2013).
- DeCarlo, T. M. et al. Coral macrobioerosion is accelerated by ocean acidification and nutrients. Geology 43, 7–10 (2014).
- Mollica, N. R. N. et al. Skeletal records of bleaching reveal different thermal thresholds of Pacific coral reef assemblages. Coral Reefs 38, 743–757 (2019).
- Puritz, J. B., Hollenbeck, C. M. & Gold, J. R. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms. PeerJ 2, e431 (2014).
- Veron, J. E. N. Corals of the World. (Australian Institute of Marine Science, 2000).