- All scripts use tidyverse.
- When the folder is not specified, the file was save to the corresponding species folder.
- For the plots, when one figure is saved per species, the name is shown as SPECIES...
- Some files are saved with the date of analysis in the file name, here the name is shown as ...DATE.OF.ANALYSIS...
- Figures exported as both pdf and tiff are shown with the extension (pdf|tiff)
- Two scripts (FUNCTIONS_GOplots.R and getSemData.R) are also available as gists:
- getSemData.R: collects the GO term information for Arabidopsis for calculation of similarity between GOterms
- FUNCTIONS_GOplots.R: creates functions that are used in the scripts 13 and 14 below.
- Script uses the package CustomSelection1,2 to:
- Get the expression levels in TPM for each gene and sample (output saved to ExpressionLevels_TPM.tsv)
- Calculate the minimum TPM value to accept a gene as expressed in each sample (shown as
$log_2(TPM)$ , output written toMinimumEpresion_log2TPM.tsv
) - Get the genes
$mean(TPM) > 2^{mean(cutoff)}$ and selects the top .5% genes from lowest to highest coefficient of variation. Output written toresultsCustomSelection.tsv
.
-
10-DESeq_plantLine.R
- Script filters the counts data frame to:
- Remove genes not passing the minimum TPM level threshold from the counts table. Saves output to
counts_filtered.tsv
- Performs differential expression analysis using DESeq2 using the selected reference genes (these are used to estimate teh amount of input material for each sample).
- Alpha: 0.01; Null hypothesis:
$log_2fold.change = 0 $ ; Fold change threshold:$abs(log_2Fold.Change) > 2$ ; Adjusted p-value threshold:$padj < 0.01$ - Result saved in long data frame format in:
DESeq_DATE.OF.ANALYSIS.tsv
- Exports PCA result calculated using the regular log transformation on the normalized read counts.
- Three files saved: plot saved to
plots/SPECIES_PCA_DATE.OF.ANALYSIS.(pdf|tiff)
- Remove genes not passing the minimum TPM level threshold from the counts table. Saves output to
- Script filters the counts data frame to:
-
11-loadAndPrepareDEGs_Files.R
- Imports the results from the previous script and adds the columns: plantLine, SampleName (the effector expresed) and Deregulation ("Up-regulated", "Down-Regulated" and "Non-regulated").
-
12-heatmaps.R
- Creates a wide table to calculate a gene dendrogram based on the fold change values.
- Uses the dendrogram to order the genes in the heatmap and plots the heatmaps of Arabidopsis (on top) and Poplar (at the bottom) in the file
plots/Heatmaps_DATE.OF.ANALYSIS.pdf
-
13-annotateDeregulatedGenes.R
- Uses biomaRt1,2 to browse Ensembl Plants biomarts for annotation of the deregulated genes and for search of the poplar homologs of arabidopsis deregulated genes, and vice-versa.
- Annotation includes the gene name and description as well as GO ID, GO name and GO ontology (biological process, molecular function and cellular compartment)
- Exports the files
DEGs_annotation.txt
(genes deregulated by either effector),Annotation_commonDEGs.txt
(genes deregulated by both effectors) andDEGs_homologs.txt
-
14-Commonly_deregulated_genes.R
- Creates the tables
(Mlp72983|Mlp52166)_deregulatedGenes.csv
and the plotsplots/SPECIES_scatterPlot_DEGs_DATE.OF.ANALYSIS.pdf
(each gene is a dot, its fold change in the two transgenic lines of each effector is used as its XY coordinate in the plot) andplots/SPECIES_numberDEGs_DATE.OF.ANALYSIS.pdf
.
- Creates the tables
-
15-Heatmap_commonDEGs.R
- Finds genes deregulated in both species and uses cluster 2.1.4 1 functions
daisy(metric = "gower")
anddiana(diss = T)
to get the dendrogram and order the genes in the heatmap. - Exports the figure
plots/Heatmap_homologs_deregulated.pdf
, which only includes genes deregulated in both species (meaning that a deregulated gene is included if one or more of its homologs in the other species is also deregulated). - Creates the VennDiagram
plots/SimpleVenn_DEGs.pdf
, which uses the combination of arabidopsis gene id and poplar gene id to find the number of genes deregulated in each condition and in the different combinations, regardless of the species.
- Finds genes deregulated in both species and uses cluster 2.1.4 1 functions
-
16-VennDiagram.R
- Finds the number of genes deregulated in each group (effector-species) and the number of homologs in the other species. In areas that combine the two species, shows the number of homologs also deregulated. Plot saved as
plots/Venn_diagram_deregulatedGenes_homologs.pdf
- Finds the number of genes deregulated in each group (effector-species) and the number of homologs in the other species. In areas that combine the two species, shows the number of homologs also deregulated. Plot saved as
-
17-prepareData_GOenrichment.R
- Gets the GOterm annotation for all the genes expressed in each species and exports as RData in
RData_files/GO_annotation.RData
- Using the gist
getSemData.R
, obtains and saves (in the fileRData_files/semantic_GO_At.RData
) the go information for arabidopsis for calculation of similarity between terms.
- Gets the GOterm annotation for all the genes expressed in each species and exports as RData in
- 18-get_geneSets_goEnrichment.R
Footnotes
-
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.(2022). cluster: Cluster Analysis Basics and Extensions. R package version 2.1.4. ↩