This repository is intedended to follow up with the summer REEU internship. Main objective of the internship is to have an idea how to process Genome-wide association studies (GWAS), using as models using the phenotypic and genotypic of data of poplar (Populus trichocarpa) tree populations for the pathogen Sphaerulina musiva.
To beging with these analyis, the student need to get familiarized with unix/linux commands, navigate the CQLS computer cluster, and version control and software development using Git.
- Start using the terminal
1.1. Install a terminal prompt for windows
1.2. Create, move and delete files using the terminal
1.3. How to edit files usingnano
andvi
, and other text edit software
1.4. Create a github
1.4.1. Create a repository
1.4.2. Git add, git commit, git push and git pull! - Accessing to the CQLS
2.1 Working in the CQLS cluster
2.2 Data structure
2.3 Accessing data
28 Jun:
- Access to the CQLS
- Create a directory to organize your github
- Create a repository for the poplar/septoria GWAS on GitHub
3.1 Clone your repository to the CQLS cluster
3.2 Create the folder structure following the book "Guide to reproducible code"
3.3 Create the folders withmkdir
and populate the directories witgh "mock" files and scripts (comment the files with your name) - Clone your gwas repository in your local git (on your ubuntu machine)
29 Jun
- Install
cutadapt
1.1 Read the manual - Find the adapter
- Cut yor primers and filter by quality below 30 Phred score
- Put together a
bash
script to process ALL your samples with one script
Hint: You can find the adapter in one of your outputs from fastQC
30 Jun
-
Work with bash commands
sed
cut
awk
-
Practices loops for
bash
-
Retrive from NCBI or any other database the genomes of:
- Populus trichocarpa
- Septoria musiva
(Find the reference genomes information on the manuscripts)
-
Understand the formats and concepts behing a genome assembly
-
Summarize the genome data for P. trichocarpa and S. musiva
1 Jul
Summary of the week -
5 Jul
- Cluster environment and submissions
- Running cutadatp in the CQLS cluster
- Download genomes
6 Jul
- Analyze results
- Check BWA and GATK
2.1 BWA commands example \
7 Jul
- Finalize working on mapping the read to the alignment against the reference Genome
- Learn the GATK commands example
- Index your Reference Sequence
11 Jul
- Wrap up last week
1.2. Edit your README.md in your poplar-septoria-GWA with all the steps we have done so far. From fastqc, cutadapt, bwa and gatk commands. \ - Install picard.jar
after bwa mem
2.1 Search how to assign groups withpicard.jar AddOrReplaceReadGroups
2.2 Search how to sort withpicard.jar SortSam
12 Jul
- Debug
gatk
VCF calling. - Execute the
bash
scripts for all septoria genome files
13 Jul
OFF
14 Jul
- Summarize Referece Genomes for Septoria and Poplar
Hints: Genome size:
No of contigs:
N50:
No. of genes: \
15 Jul
- Install GEMMA
- Copy the VCF file and the phenotype data
$ /nfs1/BPP/LeBoldus_Lab/user_folders/Shared_projects/data_REEU/PopGWAS2016.vcf.tar.gz
Jul 18
- Test the
GEMMA
software 1.1 Use the example data and understand the file structure \ - Generate the VCF
java -jar gatk.jar -T GenotypeGVCFs -R REF.fna -o file_jointcalls.vcf -V 1.vcf -V 2.vcf -V n.vcf
Jul 19
- Submit an array in the CQLS cluster to generate VCFs for the second part of septoria reads
- Parse the phenotype data by septoria isolate and poplar genotype
Jul 20
- Finalyze the parsing of the phenotype data
1.1 Generate a.txt
with the input in GEMMA structure. - Parse the phenotypes based on the
All_septoria.vcf
2.1 Get the sample names of the septoria.vcf
2.2 InstallvcfR
\
library(vcfR)
poplar <- read.vcfR(poplar.vcf)
poplar.IDs <- colnames(poplar@gt)
write.csv(poplar.name, "poplar_names_vcf.csv")
Jul 21
- Generate VCF with all individual
.vcf
files - Finalize the inputs for GEMMA input
Jul 22
- Producce the
.bim
,.bed
and.fam
- Try to run GEMMA software
** This week **
- Debugging and polishing
Tutorials:
Linux: https://www.guru99.com/unix-linux-tutorial.html
How to install linux: https://www.guru99.com/install-linux.html
BASH Sheetcheat: https://devhints.io/bash
Reproducible code: https://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf \
Manuscripts | Link |
---|---|
Poplar GWAS: | https://www.pnas.org/doi/10.1073/pnas.1804428115 |
Poplar VCF info | https://doi.ccs.ornl.gov/ui/doi/55 |
GWAS precedent | https://www.nature.com/articles/ng.3075#Sec10 |
GWAS methodology | https://www.nature.com/articles/ng.548 |
GEMMA Software | https://github.com/genetics-statistics/GEMMA |
GWAS Robustness | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3025716/ |
GWAS Doc man | https://vcru.wisc.edu/simonlab/bioinformatics/programs/fcgene/fcgene-1.0.7.pdf |
BIMBAM | http://www.haplotype.org/software.html |
-- | -- |
Septoria PopGen: | https://apsjournals.apsnet.org/doi/10.1094/MPMI-05-19-0131-R |