Contains scripts for paralogous loci filtering, output data from the populations program of Stacks, as well as R scripts used for analyses and plotting.
Directory 3Berberis_phylogeo/bin
contains the scripts (numbered) and R functions (not numbered, called from within the scripts) used for data analysis and plotting.
1.PopSamples_PostCleaning.r: filters data to keep only those samples having more than 50% of the mean number of loci per sample, and only those loci present in at least 80% of the barcoded sample
2.PopSamples_Whitelists-StacksPopulations.script: produces whitelists and populations maps to run Stacks populations program including all loci (no paralogous filtering)
3.PopSamples_excluding_paralogs.r: uses Stacks populations summary stats output to identify potential paralog loci. Output arelist of all potential paralogous loci (./docs/lociP05
) and potential paralogs within Berberis alpina (./docs/potentialparalogs
).
4.StacksPopulations_AllLoci.script: creates a whitelist file of loci and populations maps the for subset of samples to analyze. Then runs the populations program of Stacks using the lists of putantively paralogous loci and any loci where p=0.5 as blacklists. Output is in `data.out/PopSamples_m3.
4.StacksPopulations_EQsampsize.script: creates a Poulation Map for a subset of samples of equal sampling size for B. alpina, Zamorano and B. moranensis and runs the populations program from Stacks. Output is in data.out/PopSamples_m3/IncludingParalogs/AllLoci/BerEQsz
.
4.bsub.StacksPopulations.job: used to run the two previous scripts in UEA cluster ((Westmere Dual 6 core Intel X5650 2.66GHz processor systems of 12 cores with 48GB of RAM)
5.Berberisphylogeo_examaning_popsoutput.r:
A knirtr html file is provided for 3.PopSamples_excluding_paralogs and 5.Berberisphylogeo_examaning_popsoutput.
The directory data.in/PopSamples_m3
contain the coverage and SNP matrices (output from Stacks export_sql.pl) from where loci present in enough number of samples and samples with enough number of loci were filtered by bin/1.PopSamples_PostCleaning.r
.
The directories within data.out/PopSamples_m3
contain the output from the populations program of Stacks according to the following subsets of loci. They were generated by the script `bin/4.StacksPopulations_AllLoci.script":
Excluding_P05: excluding all loci with at least one SNP where p=0.5 (corresponding to Putative orthologs in the manuscript)
ExcludingParalogs: keeping only presumably orthologous for B. alpina, ie excluding potential paralogs shared among B. alpina populations and other spp. (corresponding to Putative orthologs within B. alpina in the manuscript).
IncludingParalogs: all loci, including all potential paralogs
Within each directory the data is divided according to the following subsets of samples:
- BerAll: all populations from Berberis alpina (including Za), Berberis moranensis (An population), Berberis trifolia (outgroup).
- BerwoOut: all populations from Berberis alpina (including Za), Berberis moranensis (An population) but EXCLUDING outgroup (B. trifolia)
- woZaOut: excluding samples from El Zamorano population (Za) and Berberis trifolia (outgroup)
- BerSS: Berberis alpina sensu stricto (B. alpina ingroup in the ms) populations (Aj, Iz, Ma, Pe, Tl, To) ie Berall excluding Za, Out and An.
The dictory ./docs
contains the list of all potential paralogous loci (./docs/lociP05
) and potential paralogs within Berberis alpina (./docs/potentialparalogs
).
The file ´docs/Ber_06oct13.info/ contains sample popID, barcode and sequecing library data.
The file 3Berberis_phylogeo/bin/Figures_Berberis_paralogs.Rmd
is a R markdown document detailing how figures from the main text and the supplementary materials were done.