TCGA-RNASeq-tutorial

The tutorial for a yale training session: TCGA RNA-seq Data, Download and Analyses all on your laptop.

Go to TCGA data hub

  • Navigate and select files to basket
  • Download metadata and manifest from basket
  • Download the files with GDC-client

Preprocess the metadata

Preprocess the FPKM matrix

  • Convert the downloaded files to a FPKM matrix in unix shell/terminal
for f in */*.gz; do
  id=$(dirname $f); echo $id > $id.tmp; 
  zcat $f | cut -f2 >> $id.tmp; 
done
echo 'featureId' > tmp.index
zcat $f | cut -f1 >> tmp.index
paste tmp.index *.tmp > ../geneId_fileId_FPKM.txt
rm tmp.index; rm *.tmp

Introduction of analyses in R

Using the script to:

  • Filter the genes and convert FPKM to log scale
  • Id genes coexpressed with your gene of interest
  • Id genes differently expressed between paired normal and tumor
  • PCA plot

Introduction of the analyses by FireHose

  • Gene
  • Cohort summary
  • Cohort data and workflow
  • Cohort analysis