Data importation modules for Baxter lab's GWAS database
Make a .env
file in your root directory. You may use the .env.example
as a basis for it.
# .env.example
database=baxdb
user=baxdb_owner
password=password
host=localhost
port=5432
Importing data into the GWAS database is split into four phases: initialization, gather, collection, and then results.
python import.py --verbose -f data/maize.json data/maize282
The input configuration file (.json
) is used to locate the data files. Below is an example data configuration file.
{
"species_shortname": "maize",
"species_binomial_name": "Zea mays",
"species_subspecies": "",
"species_variety": "",
"population_name": "Maize282",
"number_of_chromosomes": 10,
"genotype_version_assembly_name": "B73 RefGen_v4",
"genotype_version_annotation_name": "AGPv4",
"reference_genome_line_name": "282set_B73",
"phenotype_filename": "5.mergedWeightNorm.LM.rankAvg.longFormat.csv",
"gwas_algorithm_name": "MLMM",
"imputation_method_name": "impute to major allele",
"kinship_algortihm_name": "van raden",
"kinship_filename": "4.AstleBalding.synbreed.kinship.csv",
"population_structure_algorithm_name": "Eigenstrat",
"population_structure_filename": "4.Eigenstrat.population.structure.10PCs.csv",
"gwas_run_filename": "9.mlmmResults.csv",
"gwas_results_filename": "9.mlmmResults.csv",
"missing_SNP_cutoff_value": 0.2,
"missing_line_cutoff_value": 0.2,
"minor_allele_frequency_cutoff_value": 0.1
}
- Phenotype file
.csv
- Kinship
.csv
- Population structure
.csv
- GWAS results/run
.csv
- Genotype
.012
,.012.indv
, and.012.pos
(generated by VCF)
This file contains all measures and measurements for each pedigree. It is the source for the tables: phenotype
This file is a simple 2D matrix of all the lines/pedigrees and thei kinship measurements.
This file contains N prinicple components to define the population structure
This contains the results of the GWAS analysis. It will include the SNP, p-value, cofactor, null p-value, model, trait, number of SNPs, number of lines, and principle components
These files are sometimes collapsed into three single files, but they must be separated by chromosome, using the naming convension: chr<NUMBER>_species.<EXTENSION>
For example: chr4_maize.012
, chr4_maize.012.pos
, chr4_maize.012.indv
.