Bioinformatics pipeline for population genomic analysis and visualization. the tool calculates population summary statistics and tests for genome-wide association and differentiation based on single nucleotide polymorphisms.
- Query individuals within VCF and calculate population summary statistics (process_radtags)
- Write groups file (R)
- Visualize population summary statistics (R)
- Write individual corrected PED and VCF files (plink1.9,bcftools, vcf2baypass.pl)
- Standardize and write environments (R)
- Test Genome-Wide association and differentiation (BayPass, LEA)
- Present candidates based on multivariate outlier detection (R)
- Combine across environment tests and evaluate for overlap (R)
nextflow run main.nf -c nextflow.config --meta samplesheet.csv --entap final_annotations_nocontam_lvl1.txt --vcf snpeff_annotated.vcf --headers_key ncbi_chromosome_ids.txt --first_env_column 3 --ggplot_indv_factors <indv_list.txt> -entry NF_GWAS -with-singulariy gwas-nf.sif
Column | Description |
---|---|
id | Sample name. Must match VCF sample name program |
population | A string describing the individual grouping |
env | A numerical value describing an environmental condition |
You will need to create a meta file that contains, in order, columns with individual ID's found within the VCF, Population, and an environmental variable or variables. You can also specify the column number in which environmental variables start and the tool will interpret that and every column afterwords as an environmental variable
Depends on final annotation output from EnTAP, snpEff VCF output, and chromosome key with chromosome id's in the first and numbers in the second column.
Provide an optional list of individuals to order VIS off of. Manipulates population summary statistics VIS