Cat-Does-Plant: A Jupyter Notebook repository from DeadlineWasYesterday

Comprehensive analysis and GWAS of biomass, chlorophyll, seed and salinity tolerance related traits in rice 🌾 🐾

Notebooks/Phenotypes_compilation.ipynb ➡️ Prepares working genotype means from plant data
Notebooks/Phenotype_stats.ipynb ➡️ Basic analytics on phenotypes i.e. histograms, density plots, Shapiro-tests
Notebooks/Broad-sense_heritability.ipynb ➡️ Calculates broad-sense heritability from total and genotype variance
R scripts/Random_effects_modelling_for_heritability.R ➡️ Estimates trait heritability by modelling genotype, condition and their interactions as random effects
R scripts/Marker-based_heritability.R ➡️ Estimates heritability from genomic kinship

Shell scripts/1.download_176vcf_data.sh ➡️ Uses curl to download individual VCF files from the 3000-rice genome project
Shell scripts/2.gzip_to_bgzip.sh ➡️ Converts gziped VCF filed to bgzip compression
Shell scripts/3.combine_vcf.sh ➡️ Combines individual VCF files into one
Shell scripts/4.beagle_imputation.sh ➡️ Imputes missing marker genotypes using Beagle 5.1
Notebooks/Imputation_accuracy.ipynb ➡️ Assessment of imputation accuracy
Python scripts/Make_working_files.py ➡️ Prepares a number of working files
Python scripts/Make_hmp.py ➡️ Prepares a hapmap genotype file
Shell scripts/5.plink_conversion_and_pruning.sh ➡️ Prepares plink files and estimates effective number of markers

R scripts/Genomic_predictions.R ➡️ Uses ridge regression in mixed.solve() to predict phenotypes
Python scripts/Transformations_p1.py ➡️ Prepares a shell script for WarpedLMM transformation
Shell scripts/7.transform_phenotypes.sh ➡️ Executes WarpedLMM
Python scripts/Transformations_p2.py ➡️ Compiles WarpedLMM results

R scripts/Population_structure_estimation.R ➡️ Population structure estimation using genomic scatter plots, PCA and k-means clustering
Shell scripts/6.fastStructure1-15.sh ➡️ Employs fastStructure for population structure estimation and finds appropriate number of subpopulations
Python scripts/Split_populations.py ➡️ Splits working files into subpopulations according to population structure
R scripts/GAPIT_for_GWAS.R ➡️ Tests markers for phenotype association using the BLINK algorithm and CMLM
R scripts/9.LD_decay.sh ➡️ Determines extent of linkage disequilibrium

R scripts/Plotting_GWAS_results.R ➡️ Prepares manhattan and quantile-quantile plots
Shell scripts/8.blast.sh ➡️ BLAST for finding physical locations and ranges of known genes
Notebooks/Significant_Intergenic_markers.ipynb ➡️ Compiles significant and suggestive marker associations from GWAS that are within known gene regions
R scripts/Dendogram_and_second_gene_expression_heatmap.R ➡️ Clusters genes by dendograms and heatmaps
Notebooks/Slice_VCF.ipynb ➡️ Extracts intergenic markers
R scripts/Beautiful_Exon_Extractor.R ➡️ Extracts exons from pairs of genes and CDSes
R scripts/Beautiful_Intron_Masker.R ➡️ Masks introns from gene-CDS pairs
Notebooks/SNP_effects_and_haplotype_testing.ipynb ➡️ Deciphers protein level consequences of polymorphisms and tests alleles by ANOVA and Student's t test
Notebooks/Multiple_testing_correction_and_LD_statistics.ipynb ➡️ Calculates FDR-adjusted p values using the Benjamini-Hochberg method. Evaluates LD for markers and QTLs
Notebooks/Plots.ipynb ➡️ Miscellaneous visualizations