In this tutorial we'll demonstrate the analysis of a single-cell dataset using the Seurat package. Seurat is a popular R toolkit for the analysis of single-cell data developed by the Satija lab at the New York Genome Center. You can find extensive documentation and example analyses on the Seurat website: https://satijalab.org/seurat/index.html
If you're new to R, some excellent resources for getting started with the language are available here: https://swirlstats.com, https://www.codecademy.com/learn/learn-r
In this workshop we'll also be running an analysis using jupyter notebooks. These are intuitive to use, and enable mixing code with text. You can find out more about jupyter notebooks and jupyterlab here: https://jupyter.org
The easiest way to get started is to install miniconda if you haven't already: https://docs.conda.io/en/latest/miniconda.html
Next, clone this repository:
git clone https://github.com/timoast/UCLA-T32.git
cd UCLA-T32/
While we've provided the materials here in the format of a jupyter notebook, RStudio also provides an excellent environment for using the R language (and is what I use most of the time for analysis and development in R).
Create and activate a new conda environment using the environment file in this repository:
conda env create -f environment.yaml
conda activate t32
Next launch R and install the IRkernel and required packages:
# install the R kernel for Jupyter
IRkernel::installspec()
# install Seurat
install.packages("Seurat")
# install presto for fast DE (optional)
install.packages("remotes")
remotes::install_github('immunogenomics/presto')
# install dplyr for data manipulation (optional)
install.packages("dplyr")
# install SeuratDisk for loading h5seurat files (optional)
remotes::install_github("mojaveazure/seurat-disk")
Optionally, you can also install the RStudio keyboard shortcuts for jupyter. This
allows you to, for example, insert the arrow assignment <-
by pressing option -
and the pipe %>%
by pressing cmd shift M
:
jupyter labextension install @techrah/text-shortcuts
See the github page for more information.
For this tutorial we'll be using a publicly available multimodal single-cell dataset generated by 10x Genomics, profiling human peripheral blood mononuclear cells (PBMCs). This dataset measured gene expression alongside the abundance of nine cell-surface proteins in 1,000 single cells and can be downloaded from the 10x Genomics website:
wget https://cf.10xgenomics.com/samples/cell-exp/6.0.0/1k_PBMCs_TotalSeq_B_3p_LT/1k_PBMCs_TotalSeq_B_3p_LT_filtered_feature_bc_matrix.tar.gz
tar -xvf 1k_PBMCs_TotalSeq_B_3p_LT_filtered_feature_bc_matrix.tar.gz
rm 1k_PBMCs_TotalSeq_B_3p_LT_filtered_feature_bc_matrix.tar.gz
Optionally, we can also download a PBMC reference dataset that will be useful in our analysis (2.21 Gb file):
wget https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seurat
Many different free public datasets are available on the 10x Genomics website: https://support.10xgenomics.com/single-cell-gene-expression/datasets.
If you're running on a local machine, you can start jupyter simply by running:
jupyter lab
If you're using a remote server, you need to first start jupyter on the remote server:
jupyter lab --no-browser --port=8889
Then set up ssh port forwarding on your local machine:
ssh -f <remote_server_address> -L 8889:localhost:8889 -N
You can then open the browser and go to the url printed by jupyter when it launched.