These scripts and resources are needed to reproduce the results of a study of Hypoplectrus maya demography and speciation.
Software dependencies are listed in Suppl. Table 5 of the study's manuscript, and should be added to the $PATH variable for proper functioning of all scripts. In addition, the location of the GATK and Picard .jar
files must be represented by specific environmental variables, $GATK
and $PICARD
, respectively.
Due to constraints on file sizes and copyrights, most raw data and larger resources (e.g. genome assemblies, geographic maps) cannot be included in this repo. As such, they'll need to be downloaded from their respective sources:
Data | Source | Directory |
---|---|---|
H. maya & H. gemma Sequences | ENA project number PRJEB29705 | $WORK/0_data/1_fastq/ |
H. puella, H. nigricans, & H. unicolor sequences | ENA project number PRJEB27858 | $WORK/1_output/1.4_dedup/ |
H. puella reference genome | ENA project number PRJEB27858 | $WORK/0_data/0_resources/HP_genome_unmasked_01.fa |
Belize Map shapefile | GADM: https://gadm.org/download_country_v3.html | $WORK/6_graphs/0_data/ |
Mexico Map shapefile | GADM: https://gadm.org/download_country_v3.html | $WORK/6_graphs/0_data/ |
Florida (USA) Map shapefile | GADM: https://gadm.org/download_country_v3.html | $WORK/6_graphs/0_data/ |
Coral Map shapefile | UNEP-WCMC: http://data.unep-wcmc.org/datasets/1 | $WORK/6_graphs/0_data/ |
Note that the Belizean H. puella, H. nigricans, & H. unicolor sequencing data will require pre-processing before placement in the directory above. The raw .fastq
files must be converted to deduplicated .bam
files, following https://git.geomar.de/puebla-lab/hamlets_ILD_vision_pigmentation.
Once it is properly placed in the $WORK/0_data/0_resources/
folder, the genome will need to be indexed with command bwa index HP_genome_unmasked_01.fa
(a script suitable for cluster submission of this commanded is included in $WORK/1_genotyping-scripts/
).
Running all scripts in numerical order (i.e. folder 1_genotyping-scripts
before 2_popgen-scripts
, 1.1.loop_fq2ubam.sh
before 1.2.loop_markAdapters.sh
, and 1.9.1.subset_LGs.sh
before 1.9.2.subset_allBP.sh
) will create all figures presented in the manuscript. Specifically, data for each figure of the may be found in the following locations:
Figure | Source |
---|---|
Table 1: | $WORK/5_KH_analyses/out/fst/fst_globals.txt and $WORK/5_KH_analyses/out/tables/dxy.csv |
Figure 1: | $WORK/6_output/range_map.pdf |
Figure 2: | $WORK/6_output/divdif.pdf |
Figure 3: | $WORK/6_output/gemplusbel_msmc2_trimmed.pdf |
Figure 4: | $WORK/6_output/gemplusbel_crosscoal_joined_cowplot.pdf |
Figure 5: | $WORK/6_output/maybel_LDNe.pdf |
Suppl. Fig. 1: | $WORK/6_output/Bel_map.pdf |
Suppl. Fig. 2: | $WORK/6_output/FL_map.pdf |
Suppl. Fig. 3: | $WORK/6_output/FL_dens.pdf |
Suppl. Fig. 4: | $WORK/6_output/NMDS_hams.pdf |
Suppl. Fig. 5: | $WORK/5_KH_analyses/out/plots/fst_maya_only.png |
Suppl. Fig. 6: | $WORK/6_output/pi_plot.pdf |
Suppl. Fig. 7: | $WORK/6_output/roh_plot.pdf |
Suppl. Fig. 8: | $WORK/6_output/relatedness_mle_ajk.pdf |
Suppl. Fig. 9: | $WORK/6_output/gemplusbel_msmc2_full.pdf |
Suppl. Fig. 10: | $WORK/6_output/gemplusbel_msmc2_unmasked.pdf |
Suppl. Fig. 11: | $WORK/6_output/gemplusbel_smcpp.pdf |
Suppl. Tab. 3: | $WORK/5_KH_analyses/tables/outlier_table.tex |
Suppl. Tab. 4: | $WORK/3_output/3.3_phased_indiv_depths/phased.snps.idepth |
Specific numerical values quoted in the manuscript text are drawn from the same datasets which generated these figures and tables. All those tables not included required no intermediate analysis steps, and have sources listed in the manuscript.
Note that scripts are not explicitly included for the recreation of LDNe analyses that ended in consistently infinite estimates (H. nigricans, H. puella, and H. unicolor analyses). To generate these results, edit 4.1.select_maybel_SNPs.sh
such that the selected individuals (-sn <ID>
) represent the desired group, and run all subsequent steps in LDNe analysis.