Time Point 1 (5 min) - Identify the data, how was it produced?
Time Point 2 (10 min) - Walk through the data through visualization and analysis
Time Point 3 (5 min) - Interpretations of results
Time Point 4 (10 min) - Additional ways of analysis, and general Q & A
Checkpoint A: What is RNA-Seq, and why use it?
RNA-Seq is a next generation sequencing (NGS) method used to inform us of changes in the transcriptome. A transcriptome being the mRNA of a given cell or organism. RNA-Seq is commonly used as a tool for identifying changes before or after treatment in vitro (in a test tube) or in vivo (directly in the organism or environment of interest).
WikiJournal of Medicine
Checkpoint B: What is the nature of the RNA-Seq data we are using today?
Let's scroll up to the Data_RNA-Seq_DE-Output folder and take a look. What do we see inside? There are two .csv files of normalized counts from RNA-Seq output data. These two files represent a biological replicate for treatment of yeast Candida albicans strain CAY540, with specific interests in identifying up- and down- regulated genes related to biofilm formation. Today we will focus on the ntar genes.
Gulati and Nobile 2016, Microbes and Infection
For this section, we're going to switch over to code. Scary! 😱
Checkpoint C: Why is it more advantageous to utilize code for our analysis over Excel?
Right now we're only working with two replicates, but imagine working with 40 or 100. How much time do you estimate it would take to walk through each of those in an Excel sheet versus running a chunk of code? Let's just say "A lot less time"! We can think of using code analagous to using a calculator, saving us time and energy we can then re-direct elsewhere. In a longer session, we would walk through the installation process of the platforms behind this code (Github, which you are looking at now, Jupyter and R). In the interest of time, we will be working from a previously made bioinformatics version of a 'lab notebook' found here.
Checkpoint D: Identify and explain what aspects of the worflow are still in need of improvement.
a. How do we know our results are statistically significant?
b. Are there better ways to label or analyze our data?
c. With the code as an example, what are the differences between data visualization and statistical analysis?
d. Why may it be advantageous to include code in a paper?
Sabah Ul-Hasan (@sabazhero)
Thanks to PhD Student Austin Perry and the Nobile Lab for permission to use this raw data sample.