/simple-seurat

Simplifying Seurat data processing, clustering, and analysis

Primary LanguageRMIT LicenseMIT

simple-seurat

Simplifying Seurat data processing, clustering, and analysis. This setup assumes that your data has already been cleaned. If not, you can refer to the Seurat Guided Clustering Tutorial for advice on data cleaning. As in the Seurat tutorial, this function utilizes the Seurat, dplyr, and patchwork libraries.

The Cluster() function collects and runs the Seurat functions necessary to process the data, identify an ideal number of dimensions, cluster the data, and plot the clustering. Choice of dimensions is done using PCA analysis, normalizing the Percent Variance Explained around zero and then taking all positive amounts. In my experience, this identifies a strong 'elbow' in the PCA plot, but may be underinclusive, depending on the data and your goal. The data object is the only required argument at this time. The dims argument allows the user to choose the number of dimensions. The strength argument allows the user to tell the algorithm to select a greater or lesser share of dimensions automatically. The labels function, currently defaulted to TRUE, turns labelling by top marker off and on. The functions involved in the autolabelling are the more intense in the Cluster() function, so if processing speed is important, this argument should be set to FALSE.

The Subcluster() function is meant for Seurat objects which have already been clustered, either by the cluster function above or manually. It takes as its argument data object, the numerical cluster identifier(s), and the number of dimensions. If the Cluster function was already run, the Subcluster function defaults to the number of dimensions identified by the Cluster function's PCA analysis. If the Cluster function has not been run, the number of dimensions will need to be manually supplied. Subclustering can be run on a single cluster or on a vector of clusters. The dimensional analysis can be massaged to keep the subcluster a similar shape to the original superclusters. This function can be utilized either for deeper analysis within clusters or as a clustering quality analysis tool.

The FeaturePlotAnalysis() function is a clustering quality analysis tool built on existing Seurat functions which takes as arguments the data object, which cluster to examine, and the optionally the number of features to examine. The output is a FeaturePlot of those features against the overall clustering plot, which allows researchers to quickly and easily identify the top markers associated with each cluster, as well as examine whether those clusters are appropriately associated with those markers. It can also help identify whether some clusters have been grouped around markers which are common across all clusters. This seems especially common with the initial Seurat clusters chosen by these functions (cluster 0).