Dispersal limitations and long-term persistence drive differentiation from haplotypes to communities within a tropical sky-island: evidence from community metabarcoding
The repository contains the pipeline to perform bioinformatics tools from metabarcoding data of the project Dispersal limitations and long-term persistence drive differentiation from haplotypes to communities within a tropical sky-island: evidence from community metabarcoding. We focus on a single tropical sky-island, Nevado de Toluca of the Transmexican Volcanic Belt, where we sampled whole-communities of arthropods for eight orders with a comparable design at a spatial scale ranging from 50 m to 20 km, using 840 pitfall traps and whole community metabarcoding. These samples were then used to build metabarcoding libraries of arthropods using COI marker.
Primers and overhang adapter:
COI expected size 418 pb
B_F 5' CCIGAYATRGCITTYCCICG 3' (Shokralla et al., 2015)
Fol-degen-R 5’ TANACYTCNGGRTGNCCRAARAAYCA 3' (Yu et al., 2012)
The overhang adapter sequence must be added to the locus‐specific primer for the region to be targeted. The Illumina overhang adapter sequences to be added to locus‐specific sequences are:
Forward overhang: 5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG‐[locus‐ specific sequence] Reverse overhang: 5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG‐[locus‐ specific sequence]
Contains data and scripts for the sections Bioinformatics processing to identify OTUs at different thresholds of genetic similarity, Community diversity and composition and Similarity distance-decay and landscape connectivity of the manuscript.
The bin directory cointains the scripts used in this pipeline. All scripts run using bin
as working directory. The data directory is not included in this repository, but data is available at this dryad.
Data comes from Cornell Institute of Biotechnology, Cornell University, USA, for sequencing on a lane of Illumina MiSeq 2x300 bp.
The scripts in /bin
should be run in the order they are numbered. R functions used by some of these scripts are not numbered and have the extension .R
. Html notebooks are provided for some of the analyses in R.
Scripts content:
0.0install_software.txt
well, not actually a script, but this fille contains packages for the0.1Processing_BioSoup.sh
and0.2Processing_Soup_metaMATE.sh
scripts into Arribas et al., 2020. The packages are fastqc, fastx-toolkit, trimmomatic-0.36, pairfq-0.17, usearch-9.2, and usearch-10.0.1Processing_BioSoup.sh
Paired-end reads of samples were quality filtered following procedures described by Arribas et al. (2020). Briefly, processing included quality checking, primer removal, pair merging, quality filtering, denoising, and clustering each library independently. Size COI = 418 bp.0.2Processing_Soup_metaMATE.sh
Paired-end reads of samples were quality filtered following procedures described by Arribas et al. (2020). Briefly, processing included quality checking, primer removal, pair merging, quality filtering, denoising, and clustering each library independently. Size COI = 416-420 bp.0.3Steps_afterProcessing.txt
well, not actually a script. Contains the steps for each library: Get unix, blast to MEGAN and visualised tree in figtree.0.4Searching_Stop_Codons.txt
not actually a script. Here, each ASV dataset was aligned in Geneious for searching codon stops.0.5to_get_ASV_tables.sh
script for geting a community table then generated with read-counts (haplotype abundance) of each retained ASV for the eight orders by matching ASVs against the complete collection of reads.0.6to_get_UPGMAtree_GMYC_MH_lineages_All.r
This script gets all scripts for the first step (source) that obtained lineages at different clustering levels for each of the orders. STEP 1 and 2: We get analysis of the UPGMAtree and GMYC each lineages. Step 3: We get analysis to apply NODE.MIN get trees multiple each lineages (e.g. We went at "Arachnida/SpeciesDelimitation/2to apply NODE.MIN_Get_trees_multiple_each_lineage_ArachnidaStep2" and we ejecuted each script with differente order).1.PrepareRastersNT.r
Script to reclassify altitude, slope, and vegetation types output rasters to desired values and to to create a flat lanscape.2.PlotRaster.R
Plots each resistance among with sampling points.3.to_get_conservative_threhold_on_original_all.r
This script gets conservative thershold on origibale table ASVs by artropods order. It is important to place the correct number of columns and rows, because this can change in each group. In this step, we can join all the groups or put them separately as in this case.4.to_get_Diversity_All.r
This script gets all scripts (source) of diversity using 8 arthropods order at multi-hierarchical levels.5.Plot_Diversity_All.r
This script plot "Community diversity and composition", "plot global Richness by sites", "Beta diversity", "Non-Metric Multidimensional scaling (NMDS) ordinations of community similarity", and "Accumulation Curves".6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r
This script gets all scripts (source) to estimate "Distance decay at the multihierchical leveles" and Isolation by resistence.7.Plot_DistanceDecay_IBD_IBR
This script plot "Distance decay" and Isolation by resistance at the figure 5, figure 6 and Figure S6.8.Circuitscape_Tutorial.md
well, not actually a script. Contains the settings used to run Circuitscape for each of the rasters.
These scripts use the data in genetic
, spatial
and meta
.
Contains genetic data in and data out for each order
Genetic data in corresponds to 0.1Processing_BioSoup.sh
output using the subset of each order.
Genetic data out corresponds to 4.to_get_Diversity_All.r
using the subset of 8 orders and 6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r
output using the subset of Diptera and Collembola order.
Contais spatial data as follows:
-
/Elevation
contains.asc
rasters for the Nevado de Toluca.NevTol_Alt.tif
is the original dataset, the rest are the result of reclassifying (output of1.PrepareRastersNT
script) it for each of the altitudinal resistance surfaces. -
/Slope
contains.tif
rasters for the Nevado de Toluca slope.NevTol_Pen.tif
is the original dataset, the rest are the result of reclassifying (output of1.PrepareRastersNT
script) it for each of the slope resistance surfaces. -
/VegetationType
contains.tif
rasters for the Nevado de Toluca using Vegetation type.nevado_f.tif
is the original dataset, the rest are the result of reclassifying (output of1.PrepareRastersNT
script) it for each of the vegetation type resistance surfaces. -
surveyed_mountainNevadoToluca
contains sampling points (.csv
) used in2.PlotRasters.R
-
IBDistanceMatrix
contains geomatrix focal point (*.txt
) used in to run6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r
. -
IBResistanceFlatMatrix
flat output (*.txt
) from Circuit scape used in to run6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r
. -
IBResistanceSlopeMatrix
slope output (*.txt
) from Circuit scape used in to run6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r
. -
IBResistanceVegTypeMatrix
Vegetation type output (*.txt
) from Circuit scape used in to run6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r
. -
IBResitanceElevationMatrix
elevation output (*.txt
) from Circuit scape used in to run6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r
. -
Circuitscape
contains the focal points (*_FocalPoint.txt
) used to run Circuit scape and the output (/out
).
ConservationForestNevadoToluca.csv
contains metadata for each of the samples sequenced in a lane Miseq. Each column names refer to:
id
: sample number IDcode
: ID of the sequencing runlabel_metabarcoding
: sample name of each library: e.g.CON_NTO_TLC_31TCONS1
:CON
Conservation (Treatment),NTO
Nevado de Toluca (Mountain),TLC
Tlacotepec (locality),31TCONS1
ID of the sequencing sample run.locality
: Locality of the samplingkey_locality
: Abbreviation of the sampling locationmunicipality
: Municipality of the samplingstate
: State of the samplingnatural_protected_area
: Natural Protected Area of the samplingkey_natural_protected_area
: Abbreviation of the Natural Protected Arealatitude
: Latitude of the samplinglongitude
: Longitude of the samplingsamplig_altitude
: Meter above level seal of each the sampling point.treatment
: Name of the treatmentkey_treatment
: Abbreviation of the treatmentseason
: Name of season in the samplingforest_type
: Name of the tree specie in the forest.
END