Repository for Seurat anaysis
- Seurat
- Table of Contents
options(repos="http://cran.ism.ac.jp")
install.packages('dplyr')
install.packages('Seurat')
install.packages('optparse')
install.packages('ggpubr')
install.packages('cowplot')
install.packages('grid')
install.packages('ggplot2')
install.packages('hdf5r')
Downloading the package "hdf5r" may not work on ssh.
10x_seurat.R automates the analysis shown in the Seurat tutorial.
10x chromium output files.
./input_dir
├--- barcodes.tsv.gz
├--- features.tsv.gz
└--- matrix.mtx.gz
- .png format plots
- .txt files
- .tsv files -> may be used as Cibersortx input
- .rds files
$ Rscript seurat.R -i ./input_dir/ -o ./output_dir/ --project_name 'custom_project_name'
Visualise raw data and determine cutoff values to use for dropping low quality data/cells.
$ Rscript seurat.R -i ./input_dir/ -o ./output_dir/ --project_name 'custom_project_name' --visualise_rawdata
According to your plots generated in step 1, assign integers to --nFeature_RNA_min, --nFeature_RNA_max and --percent.mt custom_value.
$ Rscript seurat.R -i ./input_dir/ -o ./output_dir/ --nFeature_RNA_min custom_value --nFeature_RNA_max custom_value --percent.mt custom_value --project_name 'custom_project_name'
According to pca_*.png plots from Step2, reassign --pca_dims with the number of dimensions suitable for your dataset.
In addition, visualise the expression of previously identified genes by making a .txt file with such genes and specifying it under --marker_genes.
Note: If you do not have any a priori knowledge concerning markers, you may run without --marker_genes option.
Example of marker gene file:
"MS4A1"
"GNLY"
$ Rscript seurat.R -i ./input_dir/ -o ./output_dir/ --nFeature_RNA_min custom_value --nFeature_RNA_max custom_value --percent.mt custom_value --project_name 'custom_project_name' --jackstrawed ./output_dir/custom_project_name_jackstrawed.rds --marker_genes ./path_to/marker_gene_file.txt --find_marker
Use deg_heatmap.png, deg_features.png and cluster_scatter.png to assign each cluster a phenotype. Assign Phenotype by making a .txt file where each row contains the phenotype of a cluster. Specify this file using --cluster_id.
Below is an example of text file. In this example, the cluster 1 represents Platelets, and cluster 2 represents CD14+ Mono cells.
Platelet
CD14+ Mono
$ Rscript seurat.R -i ./input_dir/ -o ./output_dir/ --nFeature_RNA_min custom_value --nFeature_RNA_max custom_value --percent.mt custom_value --project_name 'custom_project_name' --jackstrawed ./output_dir/custom_project_name_jackstrawed.rds --marker_genes ./path_to/marker_gene_file.txt --cluster_id ./path_to/cluster_id.txt
Use deg_heatmap.png, deg_features.png, *_all_markers.tsv and *_top10_markers.tsv to identify novel biomarkers OR to visualise the expression of known biomarkers. List the genes in interest in a .txt file and specify under --marker_genes.
$ Rscript seurat.R -i ./input_dir/ -o ./output_dir/ --nFeature_RNA_min custom_value --nFeature_RNA_max custom_value --percent.mt custom_value --project_name 'custom_project_name' --jackstrawed ./output_dir/custom_project_name_jackstrawed.rds --marker_genes ./path_to/marker_gene_file.txt --cluster_id ./path_to/cluster_id.txt
Options:
-i CHARACTER, --input_dir=CHARACTER
[Required] Input data directory [default ./]
-o CHARACTER, --output_dir=CHARACTER
[Recommended] Output plot/text data directory [default ./]
-n CHARACTER, --project_name=CHARACTER
[Recommended] Name of project. Output files will be given this name [default sample]
--visualise_rawdata
[Recommended/Preprocessing] Plot raw data to determine preprocessing cutoff values. [default FALSE]
-s NUMBER, --nFeature_RNA_min=NUMBER
[Preprocessing; Filtering data] Minimum of nFeature_RNA [default 200]
-l NUMBER, --nFeature_RNA_max=NUMBER
[Preprocessing; Filtering data] Maximum of nFeature_RNA [default FALSE]
-m NUMBER, --percent.mt=NUMBER
[Preprocessing; Filtering data] Maximum percentage of mitochondria genome. Higher percent.mt indicates dead cell [default 5]
-d NUMBER, --pca_dims=NUMBER
[Clustering/Dimensionality reduction] Number of principal components to use. Cf. pca_jackstraw.png, pca_elbowPlot.png [default 10]
-j CHARACTER, --jackstrawed=CHARACTER
[Resume] Set RDS file name (and the path to that file) if you want to use previously calculated JackStraw results (e.g. *_jackstrawed.rds) [default FALSE]
--deg=CHARACTER
[Resume] Set RDS file name (and the path to that file) if you want to use previously calculated results and resume from finding DEGs (e.g. *_final.rds) [default FALSE]
-g CHARACTER, --marker_genes=CHARACTER
[Finding DEGs] path/name of textfile that contains custom marker genes
-c CHARACTER, --cluster_id=CHARACTER
[Assigning cell type identity] path/name of textfile that contains custom cluster IDs
--marker_threshold=NUMBER
[Finding biomarkers] avg_logFC threshold [default FALSE]
--marker_numbers=NUMBER
[Finding biomarkers] Top n genes to retreive as marker candidate [default 10]
-f, --find_marker
[Finding biomarkers] Plot all Marker candidates in seperate scatter plot [default FALSE]
-h, --help
Shows this help message and exit
cell_cycle_seurat.R automates the analysis shown in the Seurat tutorial. However this does not regress out scores but only assigns cell-cycle scores.
10x chromium output files.
./input_dir
├--- barcodes.tsv.gz
├--- features.tsv.gz
└--- matrix.mtx.gz
- .png format plots
- .txt files
- .tsv files
$ Rscript cell_cycle_seurat.R -i /path/to/input_dir/ -o /path/to/output_dir/
integration_seurat.R automates the analysis shown in the Seurat tutorial.
Two 10x chromium output files.
./input_dir1
├--- barcodes.tsv.gz
├--- features.tsv.gz
└--- matrix.mtx.gz
./input_dir2
├--- barcodes.tsv.gz
├--- features.tsv.gz
└--- matrix.mtx.gz
- .png format plots
- .tsv files
- .rds files
$ Rscript integration_seurat.R --input_dir1 /path/to/dir1/ --input_dir2 /path/to/dir2/ -o /path/to/output --name_dir1 name1 --name_dir2 name2 -n 'custom_project_name'
atac_integration_seurat.R automates the analysis shown in the Seurat tutorial.
- peak_matrix.h5 file
- .gtf file
- singlecell.csv file
- scRNA.rds file
- .png format plots
- .txt files
- .rds files
$ Rscript atac_integration_seurat.R -a path/to/peak_bc_matrix.h5 -b path/to/gtf -c path/to/singlecell.csv -r path/to/scRNA.rds -o 'custom_ouput_name' -n 'custom_project_name'