/RNA_seq

This repository is for maintaining the codes to analyze HCA dataset.

Primary LanguageMATLAB

RNA_seq Pipeline

This repository is for maintaining the codes for RNA_seq dataset.

Pipeline:

1. Download the Dataset:

Download the Census of Immune Cells dataset from https://preview.data.humancellatlas.org.

This will create 4 folders with fastq.gz files. Unzip the files in each folder:

tar -xzf *.fastq.gz

2. Prepare the Dataset:

  1. Download the cellranger sofware from here.

  2. Download the GRCh38 as the reference.

  3. Run cellranger count following the instructions from here. For example for MantonBM1:

cellranger count --id=MantonBM1 \
--fastqs=2a87dc5c-0c3c-4d91-a348-5d784ab48b92 \
--transcriptome=<path_to_reference_file> \
--sample=MantonBM1_HiSeq_1,MantonBM1_HiSeq_2,MantonBM1_HiSeq_3,MantonBM1_HiSeq_4,MantonBM1_HiSeq_5,MantonBM1_HiSeq_6,MantonBM1_HiSeq_7,MantonBM1_HiSeq_8

This will create a folder MantonBM1.

Locate the filtered_gene_bc_matrices_h5.h5 file.

3. Run Non-Parametric Clustering:

You have following parameters to tune for different setting:

  • cutoff_thresh : Remove genes that the sum of counts is below threshold
  • dim_red_method: Dimension reduction method { 'PCA' , 'SNE'}
  • red_dim : Reduced dimension

It will visualize samples in a color-coded manner like this:

You can provide gene names (space separate the names) and see the heatmap of the sum of expressesd genes in each cluster.