Comprehensive analysis and GWAS of biomass, chlorophyll, seed and salinity tolerance related traits in rice 🌾 🐾
- Notebooks/Phenotypes_compilation.ipynb ➡️ Prepares working genotype means from plant data
- Notebooks/Phenotype_stats.ipynb ➡️ Basic analytics on phenotypes i.e. histograms, density plots, Shapiro-tests
- Notebooks/Broad-sense_heritability.ipynb ➡️ Calculates broad-sense heritability from total and genotype variance
- R scripts/Random_effects_modelling_for_heritability.R ➡️ Estimates trait heritability by modelling genotype, condition and their interactions as random effects
- R scripts/Marker-based_heritability.R ➡️ Estimates heritability from genomic kinship
- Shell scripts/1.download_176vcf_data.sh ➡️ Uses curl to download individual VCF files from the 3000-rice genome project
- Shell scripts/2.gzip_to_bgzip.sh ➡️ Converts gziped VCF filed to bgzip compression
- Shell scripts/3.combine_vcf.sh ➡️ Combines individual VCF files into one
- Shell scripts/4.beagle_imputation.sh ➡️ Imputes missing marker genotypes using Beagle 5.1
- Notebooks/Imputation_accuracy.ipynb ➡️ Assessment of imputation accuracy
- Python scripts/Make_working_files.py ➡️ Prepares a number of working files
- Python scripts/Make_hmp.py ➡️ Prepares a hapmap genotype file
- Shell scripts/5.plink_conversion_and_pruning.sh ➡️ Prepares plink files and estimates effective number of markers
- R scripts/Genomic_predictions.R ➡️ Uses ridge regression in mixed.solve() to predict phenotypes
- Python scripts/Transformations_p1.py ➡️ Prepares a shell script for WarpedLMM transformation
- Shell scripts/7.transform_phenotypes.sh ➡️ Executes WarpedLMM
- Python scripts/Transformations_p2.py ➡️ Compiles WarpedLMM results
- R scripts/Population_structure_estimation.R ➡️ Population structure estimation using genomic scatter plots, PCA and k-means clustering
- Shell scripts/6.fastStructure1-15.sh ➡️ Employs fastStructure for population structure estimation and finds appropriate number of subpopulations
- Python scripts/Split_populations.py ➡️ Splits working files into subpopulations according to population structure
- R scripts/GAPIT_for_GWAS.R ➡️ Tests markers for phenotype association using the BLINK algorithm and CMLM
- R scripts/9.LD_decay.sh ➡️ Determines extent of linkage disequilibrium
- R scripts/Plotting_GWAS_results.R ➡️ Prepares manhattan and quantile-quantile plots
- Shell scripts/8.blast.sh ➡️ BLAST for finding physical locations and ranges of known genes
- Notebooks/Significant_Intergenic_markers.ipynb ➡️ Compiles significant and suggestive marker associations from GWAS that are within known gene regions
- R scripts/Dendogram_and_second_gene_expression_heatmap.R ➡️ Clusters genes by dendograms and heatmaps
- Notebooks/Slice_VCF.ipynb ➡️ Extracts intergenic markers
- R scripts/Beautiful_Exon_Extractor.R ➡️ Extracts exons from pairs of genes and CDSes
- R scripts/Beautiful_Intron_Masker.R ➡️ Masks introns from gene-CDS pairs
- Notebooks/SNP_effects_and_haplotype_testing.ipynb ➡️ Deciphers protein level consequences of polymorphisms and tests alleles by ANOVA and Student's t test
- Notebooks/Multiple_testing_correction_and_LD_statistics.ipynb ➡️ Calculates FDR-adjusted p values using the Benjamini-Hochberg method. Evaluates LD for markers and QTLs
- Notebooks/Plots.ipynb ➡️ Miscellaneous visualizations