/Cat-Does-Plant

Cat doing plant science.

Primary LanguageJupyter Notebook

Comprehensive analysis and GWAS of biomass, chlorophyll, seed and salinity tolerance related traits in rice 🌾 🐾

Section 1: phenotype processing

  • Notebooks/Phenotypes_compilation.ipynb ➡️ Prepares working genotype means from plant data
  • Notebooks/Phenotype_stats.ipynb ➡️ Basic analytics on phenotypes i.e. histograms, density plots, Shapiro-tests
  • Notebooks/Broad-sense_heritability.ipynb ➡️ Calculates broad-sense heritability from total and genotype variance
  • R scripts/Random_effects_modelling_for_heritability.R ➡️ Estimates trait heritability by modelling genotype, condition and their interactions as random effects
  • R scripts/Marker-based_heritability.R ➡️ Estimates heritability from genomic kinship

Section 2: genotype preprocessing

  • Shell scripts/1.download_176vcf_data.sh ➡️ Uses curl to download individual VCF files from the 3000-rice genome project
  • Shell scripts/2.gzip_to_bgzip.sh ➡️ Converts gziped VCF filed to bgzip compression
  • Shell scripts/3.combine_vcf.sh ➡️ Combines individual VCF files into one
  • Shell scripts/4.beagle_imputation.sh ➡️ Imputes missing marker genotypes using Beagle 5.1
  • Notebooks/Imputation_accuracy.ipynb ➡️ Assessment of imputation accuracy
  • Python scripts/Make_working_files.py ➡️ Prepares a number of working files
  • Python scripts/Make_hmp.py ➡️ Prepares a hapmap genotype file
  • Shell scripts/5.plink_conversion_and_pruning.sh ➡️ Prepares plink files and estimates effective number of markers

Section 3: genomic predictions, phenotype transformations

  • R scripts/Genomic_predictions.R ➡️ Uses ridge regression in mixed.solve() to predict phenotypes
  • Python scripts/Transformations_p1.py ➡️ Prepares a shell script for WarpedLMM transformation
  • Shell scripts/7.transform_phenotypes.sh ➡️ Executes WarpedLMM
  • Python scripts/Transformations_p2.py ➡️ Compiles WarpedLMM results

Section 4: population structure and GWAS

  • R scripts/Population_structure_estimation.R ➡️ Population structure estimation using genomic scatter plots, PCA and k-means clustering
  • Shell scripts/6.fastStructure1-15.sh ➡️ Employs fastStructure for population structure estimation and finds appropriate number of subpopulations
  • Python scripts/Split_populations.py ➡️ Splits working files into subpopulations according to population structure
  • R scripts/GAPIT_for_GWAS.R ➡️ Tests markers for phenotype association using the BLINK algorithm and CMLM
  • R scripts/9.LD_decay.sh ➡️ Determines extent of linkage disequilibrium

Section 5: downstream/post-GWAS analytics

  • R scripts/Plotting_GWAS_results.R ➡️ Prepares manhattan and quantile-quantile plots
  • Shell scripts/8.blast.sh ➡️ BLAST for finding physical locations and ranges of known genes
  • Notebooks/Significant_Intergenic_markers.ipynb ➡️ Compiles significant and suggestive marker associations from GWAS that are within known gene regions
  • R scripts/Dendogram_and_second_gene_expression_heatmap.R ➡️ Clusters genes by dendograms and heatmaps
  • Notebooks/Slice_VCF.ipynb ➡️ Extracts intergenic markers
  • R scripts/Beautiful_Exon_Extractor.R ➡️ Extracts exons from pairs of genes and CDSes
  • R scripts/Beautiful_Intron_Masker.R ➡️ Masks introns from gene-CDS pairs
  • Notebooks/SNP_effects_and_haplotype_testing.ipynb ➡️ Deciphers protein level consequences of polymorphisms and tests alleles by ANOVA and Student's t test
  • Notebooks/Multiple_testing_correction_and_LD_statistics.ipynb ➡️ Calculates FDR-adjusted p values using the Benjamini-Hochberg method. Evaluates LD for markers and QTLs
  • Notebooks/Plots.ipynb ➡️ Miscellaneous visualizations