Single-Cell Analysis Workshop

In this tutorial we'll demonstrate the analysis of a single-cell dataset using the Seurat package. Seurat is a popular R toolkit for the analysis of single-cell data developed by the Satija lab at the New York Genome Center. You can find extensive documentation and example analyses on the Seurat website: https://satijalab.org/seurat/index.html

If you're new to R, some excellent resources for getting started with the language are available here: https://swirlstats.com, https://www.codecademy.com/learn/learn-r

In this workshop we'll also be running an analysis using jupyter notebooks. These are intuitive to use, and enable mixing code with text. You can find out more about jupyter notebooks and jupyterlab here: https://jupyter.org

Getting started

The easiest way to get started is to install miniconda if you haven't already: https://docs.conda.io/en/latest/miniconda.html

Next, clone this repository:

git clone https://github.com/timoast/UCLA-T32.git
cd UCLA-T32/

While we've provided the materials here in the format of a jupyter notebook, RStudio also provides an excellent environment for using the R language (and is what I use most of the time for analysis and development in R).

Installing the jupyter environment

Create and activate a new conda environment using the environment file in this repository:

conda env create -f environment.yaml
conda activate t32

Installing the R packages

Next launch R and install the IRkernel and required packages:

# install the R kernel for Jupyter
IRkernel::installspec()

# install Seurat
install.packages("Seurat")

# install presto for fast DE (optional)
install.packages("remotes")
remotes::install_github('immunogenomics/presto')

# install dplyr for data manipulation (optional)
install.packages("dplyr")

# install SeuratDisk for loading h5seurat files (optional)
remotes::install_github("mojaveazure/seurat-disk")

Optionally, you can also install the RStudio keyboard shortcuts for jupyter. This allows you to, for example, insert the arrow assignment <- by pressing option - and the pipe %>% by pressing cmd shift M:

jupyter labextension install @techrah/text-shortcuts

See the github page for more information.

Downloading the demo datasets

For this tutorial we'll be using a publicly available multimodal single-cell dataset generated by 10x Genomics, profiling human peripheral blood mononuclear cells (PBMCs). This dataset measured gene expression alongside the abundance of nine cell-surface proteins in 1,000 single cells and can be downloaded from the 10x Genomics website:

wget https://cf.10xgenomics.com/samples/cell-exp/6.0.0/1k_PBMCs_TotalSeq_B_3p_LT/1k_PBMCs_TotalSeq_B_3p_LT_filtered_feature_bc_matrix.tar.gz
tar -xvf 1k_PBMCs_TotalSeq_B_3p_LT_filtered_feature_bc_matrix.tar.gz
rm 1k_PBMCs_TotalSeq_B_3p_LT_filtered_feature_bc_matrix.tar.gz

Optionally, we can also download a PBMC reference dataset that will be useful in our analysis (2.21 Gb file):

wget https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seurat

Many different free public datasets are available on the 10x Genomics website: https://support.10xgenomics.com/single-cell-gene-expression/datasets.

Launch Jupyterlab

If you're running on a local machine, you can start jupyter simply by running:

jupyter lab

If you're using a remote server, you need to first start jupyter on the remote server:

jupyter lab --no-browser --port=8889

Then set up ssh port forwarding on your local machine:

ssh -f <remote_server_address> -L 8889:localhost:8889 -N

You can then open the browser and go to the url printed by jupyter when it launched.