scGIST

scGIST is a deep neural network that designs sc-ST gene panel through constrained feature selection. Additionally, scGIST enables genes of interest to be prioritized for panel inclusion while still adhering to its size restriction.

Installation

Install the requirements using the following command (recommended python version is 3.7.11):

python setup.py install

Usage

Initialize the model

from scgist import scGIST
scGIST = scGIST()

Create the model

Gene panel design to distinguish among cell types
- n_features: number of cells
- n_classes: number of classes/ clusters/ labels,
- panel_size: number of features to be taken

scGIST.create_model(n_features, n_classes, panel_size)

Including genes of interest and/or complexes of interest
- weights: priotiy scores of the genes of interest
- pairs: list of complexes of interest

scGIST.create_model(n_features, n_classes, panel_size, weights, pairs)

Rigorously selecting the number of genes in the final panel as per panel_size
- strict: when True, the model will select exactly the same amount of genes specified by panel_size; when False, the model will select less than or equal to the number of genes specified by panel_size

scGIST.create_model(n_features, n_classes, panel_size, weights, pairs, strict)

Compile the model

scGIST.compile_model()

Train the model

Training the scGIST model which requires the following inputs:
- adata: annotated data matrix
- epochs: number of epochs

scGIST.train_model(adata, epochs)

Get the markers names (gene panel)

Plotting the gene panel with weights in a bar chart
- plot_weights: when True, the weights of the genes in the panel will be plotted

scGIST.get_markers_names(plot_weights)

Get Accuracy and F1 score with a classifier

Test performance of the gene panel with a classifier
- adata: annotated data matrix
- markers: indices of selected gene panel (scGIST.get_markers_indices())
- labels: name of the cell types
- clf: a classifier (if None, default is KNN)

from scGIST import test_classifier
accuracy, f1_score = test_classifier(adata, markers, labels, clf)

Prioritize genes of interest

Prioritize genes of interest to increase their probability of being included in the gene panel
- Read the csv file that contains gene names and their priority. The csv file must contain headers named gene_name and priority
- Convert the dataframe to a python list using utility function before creating the model with the priority score list

gene_priorities = pd.read_csv(path_to_csv_file)
priority_scores = get_priority_score_list(adata, gene_priorities)

gist.create_model(n_genes, n_classes, panel_size=panel_size, priority_scores=priority_scores, alpha=0.2, beta=0.5)

HasiHays/scGIST