Summary of scripts

General information

  • All scripts use tidyverse.
  • When the folder is not specified, the file was save to the corresponding species folder.
  • For the plots, when one figure is saved per species, the name is shown as SPECIES...
  • Some files are saved with the date of analysis in the file name, here the name is shown as ...DATE.OF.ANALYSIS...
  • Figures exported as both pdf and tiff are shown with the extension (pdf|tiff)
  • Two scripts (FUNCTIONS_GOplots.R and getSemData.R) are also available as gists:
    1. getSemData.R: collects the GO term information for Arabidopsis for calculation of similarity between GOterms
    2. FUNCTIONS_GOplots.R: creates functions that are used in the scripts 13 and 14 below.

Scripts

  1. 09-getReferenceGenes.R
  • Script uses the package CustomSelection1,2 to:
    1. Get the expression levels in TPM for each gene and sample (output saved to ExpressionLevels_TPM.tsv)
    2. Calculate the minimum TPM value to accept a gene as expressed in each sample (shown as $log_2(TPM)$, output written to MinimumEpresion_log2TPM.tsv)
    3. Get the genes $mean(TPM) > 2^{mean(cutoff)}$ and selects the top .5% genes from lowest to highest coefficient of variation. Output written to resultsCustomSelection.tsv.
  1. 10-DESeq_plantLine.R
    • Script filters the counts data frame to:
      1. Remove genes not passing the minimum TPM level threshold from the counts table. Saves output to counts_filtered.tsv
      2. Performs differential expression analysis using DESeq2 using the selected reference genes (these are used to estimate teh amount of input material for each sample).
      • Alpha: 0.01; Null hypothesis: $log_2fold.change = 0 $; Fold change threshold: $abs(log_2Fold.Change) > 2$; Adjusted p-value threshold: $padj < 0.01$
      • Result saved in long data frame format in: DESeq_DATE.OF.ANALYSIS.tsv
      1. Exports PCA result calculated using the regular log transformation on the normalized read counts.
      • Three files saved: plot saved to plots/SPECIES_PCA_DATE.OF.ANALYSIS.(pdf|tiff)
  2. 11-loadAndPrepareDEGs_Files.R
    • Imports the results from the previous script and adds the columns: plantLine, SampleName (the effector expresed) and Deregulation ("Up-regulated", "Down-Regulated" and "Non-regulated").
  3. 12-heatmaps.R
    • Creates a wide table to calculate a gene dendrogram based on the fold change values.
    • Uses the dendrogram to order the genes in the heatmap and plots the heatmaps of Arabidopsis (on top) and Poplar (at the bottom) in the file plots/Heatmaps_DATE.OF.ANALYSIS.pdf
  4. 13-annotateDeregulatedGenes.R
    • Uses biomaRt1,2 to browse Ensembl Plants biomarts for annotation of the deregulated genes and for search of the poplar homologs of arabidopsis deregulated genes, and vice-versa.
    • Annotation includes the gene name and description as well as GO ID, GO name and GO ontology (biological process, molecular function and cellular compartment)
    • Exports the files DEGs_annotation.txt (genes deregulated by either effector), Annotation_commonDEGs.txt (genes deregulated by both effectors) and DEGs_homologs.txt
  5. 14-Commonly_deregulated_genes.R
    • Creates the tables (Mlp72983|Mlp52166)_deregulatedGenes.csv and the plots plots/SPECIES_scatterPlot_DEGs_DATE.OF.ANALYSIS.pdf (each gene is a dot, its fold change in the two transgenic lines of each effector is used as its XY coordinate in the plot) and plots/SPECIES_numberDEGs_DATE.OF.ANALYSIS.pdf.
  6. 15-Heatmap_commonDEGs.R
    • Finds genes deregulated in both species and uses cluster 2.1.4 1 functions daisy(metric = "gower") and diana(diss = T) to get the dendrogram and order the genes in the heatmap.
    • Exports the figure plots/Heatmap_homologs_deregulated.pdf, which only includes genes deregulated in both species (meaning that a deregulated gene is included if one or more of its homologs in the other species is also deregulated).
    • Creates the VennDiagram plots/SimpleVenn_DEGs.pdf, which uses the combination of arabidopsis gene id and poplar gene id to find the number of genes deregulated in each condition and in the different combinations, regardless of the species.
  7. 16-VennDiagram.R
    • Finds the number of genes deregulated in each group (effector-species) and the number of homologs in the other species. In areas that combine the two species, shows the number of homologs also deregulated. Plot saved as plots/Venn_diagram_deregulatedGenes_homologs.pdf
  8. 17-prepareData_GOenrichment.R
    • Gets the GOterm annotation for all the genes expressed in each species and exports as RData in RData_files/GO_annotation.RData
    • Using the gist getSemData.R, obtains and saves (in the file RData_files/semantic_GO_At.RData) the go information for arabidopsis for calculation of similarity between terms.
  9. 18-get_geneSets_goEnrichment.R
  • Uses clusterProfiler1,2 to perform GO enrichment analysis

Footnotes

  1. Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.(2022). cluster: Cluster Analysis Basics and Extensions. R package version 2.1.4.